Saturday, 24 November 2007

GSoC Aknowledgments

I would like to thank several people for their help in my Google Summer of Code project:

I also want to express my general gratitude to some particular organizations and projects:

Friday, 23 November 2007

GSoC Status Update: November'07

It has been three months since I posted the Final GSoC project status. It's time to report what happened with the remaining tasks.

The tasks that I've completed since then are:

  • Perform code profiling to find bottlenecks and deficiencies in mod_multicast. Improve the code.
  • Once I make all the possible optimizations: perform benchmarks to check mod_multicast's effect in CPU, RAM and traffic consumption.
The tasks that I haven't completed yet are:
  • Wait for ejabberd code reviewers, in case I need to fix any problem in my XEP-0033 patches for ejabberd before they are applied to ejabberd trunk.
  • Discuss potential security and spam vulnerabilities (talk in JDEV and JADMIN mailint lists).
  • Add XEP33 support to ejabberd's Pub/Sub and/or PEP service once their codebase is stable.
  • Wait for Peter Saint-Andre's questions regarding his XEP-0033 update.
So this adventure has not ended yet.

Thursday, 4 October 2007

Travel to Costa Rica

I have a travel to San José, Costa Rica for a whole week. The expected return is on Sunday, 15th of October.

Tuesday, 2 October 2007

Results of mod_multicast execution time with timer:tc

I previously presented timer:tc and multicast_test. Now I'll illustrate some results I obtained with them.

I ran 6 experiments, using normal routing, trusted multicast and untrusted multicast; both single and multiple servers. Each experiment was ran for several number of destination addresses.

The time is expressed in microseconds (1 second = 1.000.000 microseconds). Note that the time shown here includes not only mod_multicast, but also packet building and ejabberd_router.


Results

Normal - Single

           #     Exec Time   Time per destination
Destinations (microseconds) (microseconds)
1 161 161.00
101 6074 60.14
201 22163 110.26
301 40950 136.05
401 74586 186.00
501 100445 200.49
601 107012 178.06
701 131163 187.11
801 146473 182.86
901 153147 169.97

Normal - Multiple
           #     Exec Time   Time per destination
Destinations (microseconds) (microseconds)
1 196 196.00
101 11206 110.95
201 20918 104.07
301 30811 102.36
401 43627 108.80
501 52742 105.27
601 59650 99.25
701 75291 107.41
801 81077 101.22
901 80192 89.00

Trusted - Single
           #     Exec Time   Time per destination
Destinations (microseconds) (microseconds)
1 226 226.00
101 8719 86.33
201 39938 198.70
301 77040 255.95
401 101675 253.55
501 122156 243.82
601 144422 240.30
701 158917 226.70
801 186553 232.90
901 197387 219.08

Trusted - Multiple
           #     Exec Time   Time per destination
Destinations (microseconds) (microseconds)
1 283 283.00
101 21751 215.36
201 41077 204.36
301 66047 219.43
401 102542 255.72
501 315313 629.37
601 140158 233.21
701 164836 235.14
801 171249 213.79
901 189712 210.56

Untrusted - Single
            #     Exec Time   Time per destination
Destinations (microseconds) (microseconds)
1 317 317.00
101 25528 252.75
201 79402 395.03
301 155075 515.20
401 284532 709.56
501 329484 657.65
601 357879 595.47
701 388755 554.57
801 414622 517.63
901 506557 562.22

Untrusted - Multiple
            #     Exec Time   Time per destination
Destinations (microseconds) (microseconds)
1 1121 1121.00
101 115736 1157.36
201 331379 1656.89
301 601325 2004.42
400 2478072 6195.18
...


Analysis

What does this show? The computation time per destination address remains constant across different packet sizes. The fluctuations are only due to my computer: it was being used for other tasks, not only running those experiments [1].

For example, processing a packet with 300 destination addresses (few rosters and chatrooms have more than this number of items) costs around 77 milliseconds if the source is trusted (a local MUC service or session manager). The same packet costs 155 milliseconds if the source is not trusted and the packet must be carefully inspected.

In the less bright side of the results, the Multiple set of experiments again perform quite badly compared to Single. Note that part of this is due to the function add_addresses, as explained in the Fprof article. Another part of the problem is not in the server, but in the stressing tool: the building function I implemented in multicast_test puts non-existent servernames when it's run with the 'Multiple' option: "5.localhost", "7.localhost"...


Conclusions

As a summary, the results I obtained using timer:tc are compatible with the previous results I obtained with Fprof and Jabsimul. All them indicate that mod_multicast consumes approximately, in average, as much time as the ejabberd routing functions do.

So, using multicast increases the CPU consumption, as was expected. This cost will be acceptable once the benefits of multicasting are taken into account.

---
[1] Statistically speaking, it would be preferable to run a batch of experiments and show only the average and confidence interval, but I guess this is not really required for now.

Results of code profiling mod_multicast with Fprof

I previously presented Fprof and multicast_test. Now I'll illustrate some results I obtained with them.

I tried using normal routing, trusted multicast and untrusted multicast. Both single and multiple servers. And a fixed number of 300 destination addresses.

All those experiments generate a lot of information, so I summarize here only the important results. Times are in milliseconds, and measure the full processing time in the system where the experiments were ran.


Results

Normal - Single

Total: 320
build_packet: 30
ejabberd_router: 275

Normal - Multiple

Total: 630
build_packet: 30
ejabberd_router: 593

Trusted - Single

Total: 410
build_packet: 30
ejabberd_router: 266
mod_multicast: 114
string_to_jid already requires: 52

Trusted - Multiple

Total: 2020
build_packet: 30
ejabberd_router: 700
mod_multicast: 1290
add_addresses consumes too much here: 1067

Untrusted - Single

Total: 500
build_packet: 30
ejabberd_router: 270
mod_multicast: 200
string_to_jid requires: 50

Untrusted - Multiple

Total: 2060
build_packet: 30
ejabberd_router: 640
mod_multicast: 1390
add_addresses consumes too much here: 1070


Analysis

In those examples, using mod_multicast does not reduce the time consumed by ejabberd_router because the multicast packet is mean to be sent to local users, so the routing process is called 300 times always. If the destinations were not local, and some of them were on the same servers, ejabberd_router would be called less frequently and so the usage of mod_multicast would be noticeable in that aspect too.

The Trusted multicast requires only half of the time required by ejabberd_router itself. In the case of Untrusted, the additional checks make mod_multicast as costly as ejabberd_router.

In the case of Trusted multicast, the function in mod_multicast that consumes the more processing time is jlib:string_to_jid. In Untrusted, the most problematic function is add_addresses. Maybe the code of that function can be improved.

Summarizing, I consider that mod_multicast code is fairly efficient when compared to other parts of ejabberd, specially ejabberd_router.

multicast_test: analyze mod_multicast performance

To measure mod_multicast computation consumption I developed a small Erlang module: multicast_test.

This module includes functions to create a message packet with an XEP33 'addresses' element. The number of destinations is configurable. And the server of each destination can be 'single', so all destinations are in the same server, or 'multiple', so each destination is from a different server. This packet can be sent to route_trusted or route_unstrusted. It is also possible to send individual packets to ejabberd_router.

Code profiling

It is possible to run those functions with Fprof to profile the time consumed by each function in mod_multicast:

fprof:apply(multicast_test, ROUTING, [SERVERS, NUM_DESTS]).
fprof:profile().
fprof:analyse([{dest, []}]).
Where:
  • ROUTING: testn for normal routing, testt for trusted sender, and testu for untrusted sender.
  • SERVERS: single for just a single server, multiple for a different server for each destination address.
  • NUM_DESTS: number of destination addresses.
For example, execute this in the Erlang shell of the ejabberd node:
fprof:apply(multicast_test, testu, [single, 300]).
fprof:profile().
fprof:analyse([{dest, []}]).
And you will get a file fprof.analysis with very detailed information.

Execution time

It's also possible to measure the execution time of those functions with a varying number of destinations. This will show if the performance of mod_multicast is dependant on the number of destinations or the number of destination servers...

The functions are:
multicast_test:ROUTING(SERVERS, INI, END, INC).
Where:
  • ROUTING: normal, trusted, untrusted.
  • SERVERS: single or multiple.
  • INI, END and INC: The initial number of destinations, the increment and the ending value.
For example:
multicast_test:untrusted(single, 1, 1000, 100).
I will later post some results I obtained using those functions.

Results of stressing ejabberd+mod_multicast with Jabsimul

I tested mod_multicast in a living server, stressing it with synthetically generated load using Jabsimul.

The setup was: create 300 accounts with Testsuite's userreg; create a Shared Roster Group with @all@; configure Jabsimul to login in all the 300 accounts and change presence every 60 seconds.

Note that each user has 299 contacts online, and consequently each presence change generates a presence packet with 299 destination addresses.

I ran Jabsimul against an unaltered ejabberd trunk, and also a mod_multicast enabled version. The CPU consumption in both cases varied around 20% to 30%. I couldn't find a clear difference between enabling or disabling mod_multicast. The Virtual memory was around 125MB, and Resident memory around 105MB.

It seems the computation resources required by mod_multicast are only a small part of all the processing that takes place in ejabberd. So, this test indicates that probably enabling XEP33 in an ejabberd server does not have an appreciable impact in the server CPU or RAM consumption.

The computer used in the tests:

  • AMD Athlon(tm) 64 Processor 3000+, 4.000 Bogomips
  • 1 GB RAM
  • Debian sid
  • Linux 2.6.22-2-686 (Debian package)
  • Erlang R11B-5 (Debian package)
  • ejabberd SVN r952
  • mod_multicast SVN r394

If you run your own tests and find a differing conclusion, please tell me.

PD: Some instructions to get Jabber Test Suite and Jabsimul: Benchmarking Jabber/XMPP Servers with Jabsimul

Monday, 1 October 2007

Major performance optimizations in mod_multicast

When I finished my Google Summer of Code 2007 project about implementing XEP-0033: Extended Stanza Addressing in ejabberd, I ran some benchmarking and found that my multicast module performed really bad.

It was explained in ejabberd gets XEP-0033: Extended Stanza Addressing. For example, CPU consumption was multiplied by x3 when users sent XEP33 packets to around 40 destinations.

This result was not surprising since my focus during the GSoC time was to implement XEP33 correctly, not efficiently. I added as a post-GSoC task to perform code profiling, find bottlenecks and deficiencies in mod_multicast, and improve the code accordingly.

Used tools

During September I've learned about several tools in Erlang/OTP to analyze Erlang source code. After experimenting with them in several of my ejabberd contributions, this weekend I decided it was time to come back to mod_multicast.

The tools I used for code profiling are:

  • Debugger: graphical tool which can be used for debugging and testing of Erlang programs.
  • Dialyzer: static analysis tool that identifies software discrepancies such as type errors, unreachable code, unnecessary tests.
  • Cover: coverage analysis tool for Erlang. Shows how many times is executed each line of the source code file.
  • Fprof: profiling tool that can be used to get a picture of how much processing time different functions consumes and in which processes.
For benchmarking, I used those tools:
  • Timer:tc: Measure the elapsed real time while executing a function.
  • Testsuite: to create a lot of Jabber accounts.
  • ejabberd's mod_shared_roster: to populate the rosters with a lot of contacts.
  • Jabsimul: stress the server sending constant presence changes.
  • top: view ejabberd consumption of CPU and RAM.
  • etop: similar to top, but to view Erlang processes.

Performance improvements

The part of mod_multicast packet processing that consumed more time was the traversal and formatting of the list of destination addresses. A task was specially time consuming in my early code: conversion of string to JID. And to make things worse, each destination address was converted from string to JID several times, for stupid reasons.

Yesterday I rewrote and reorganized a lot of the code that handles multicast packets. I'll now describe the important changes.

* Software engineering 101

Replace the do_route* control-passing functions with a main function 'route_untrusted' that has the control and calls worker functions.

* Erlang/OTP Programming with Style 101

Use Throw and Catch.

* Route_unstrusted

The function route_untrusted is used for any multicast packet that was sent from an untrusted source (local user, remote user, remote server, remote service). This packet is completely checked: access permissions, packet format, limit of number of destinations, packet relay.

* Route_trusted

The function route_trusted is used by local services like MUC and the session manager. Since the source of the packet is trusted by the multicast service, the packet is not checked.

* Route_common

The function route_common performs the main processing tasks: find a multicast service for each remote server (either in cache or start the query process), and send the packets.

* Packet prebuilding

There are two important improvements in the packets building task: the set of 'address' elements that represents each group of destinations is built initially, not for every packet sent. This is where the 'delivered=true' attribute is added.

Also for each group of destinations, they get a prebuilt packet which includes all the other addresses (with the 'delivered=true' attribute already present).

Finally, the list of groups is traversed, and for each one the only remaining duty is to build the final packet (by simply concatenating the list of addresses), and route to the destination.

This implementation has a complexity of order N, while the old implementation had complexity of order N^N.


Conclusions

I've described the nature of the performance improvements in mod_multicast. I'll soon describe how I ran the benchmarking tools, and the observed results.

Monday, 20 August 2007

Summary of my GSoC project

The Google Summer of Code 2007 has finished. It's time to summarize the results.

Patches submitted to the ejabberd bug tracker


Protocols that I have (partially) read during my project

Collateral tasks

  • I reported to Peter Saint-Andre all the errors that I found in the XEPs.
  • During my implementation of XEP-0033 I wrote several blog posts proposing changes in this protocol. Peter Saint-Andre will use those texts to update the protocol.
  • Start to use Emacs to format Erlang code (emacs-mode) and commit to SVN repository (psvn).
  • Start to read Joe Armstrong's new book: Programming Erlang - Software for a Concurrent World. Apply the new knowledge while programming in my GSoC tasks.
  • Start to read GSoC gift book, Karl Fogel's Producing Open Source Software - How to Run a Successful Free Software Project. Apply the new knowledge in my ejabberd tasks.
  • Continue my involvement in ejabberd as usual, which includes being active in ejabberd's forum, chatroom, mailing list and ejabberd-modules contribution SVN repository.
  • Two travels, one of them international :)

Sunday, 19 August 2007

Final GSoC project status

A week ago I posted my Almost final GSoC project status.

Since the previous status update I completed those tasks:

  • Implement or update as much as possible XEP133 Service Administration in ejabberd.
  • Prepare and submit patches to ejabberd bug tracker.
The tasks that I haven't completed and my plans to complete them are, in no special order:
  • Perform code profiling to find bottlenecks and deficiencies in mod_multicast. Improve the code. - I'll focus in that topic from now on.
  • Once I make all the possible optimizations: perform benchmarks to check mod_multicast's effect in CPU, RAM and traffic consumption.
  • Wait for ejabberd code reviewers, in case I need to fix any problem in my code before they are applied to ejaabberd trunk.
  • Discuss potential security and spam vulnerabilities (talk in JDEV and JADMIN mailint lists).
  • Add XEP33 support to ejabberd's Pub/Sub and/or PEP service once their codebase is stable.
  • Wait for Peter Saint-Andre's questions regarding his XEP-0033 update.
  • September 7th: Upload final code to Google Summer of Code hosting.
The Google Summer of Code 2007 has finished, so those remaining tasks fall out of the scope of my GSoC project timeline. However, I consider them important for my own personal project timeline. So you can expect me to work on all of them at some time.

ejabberd gets XEP-0033: Extended Stanza Addressing

I consider finished my GSoC task of implementing XEP-0033: Extended Stanza Addressing in ejabberd.

The implementation is divided in several parts:

The largest part of the code is in the multicast service (mod_multicast).

Tomorrow is the GSoC pencils down deadline. This means that I will be evaluated only for the code that I wrote until today.

I expect all my code to be eventually included in ejabberd trunk. However, I'll propose that mod_multicast is disabled by default in the example configuration. At least in the first ejabberd release that includes that module.


Benchmark

I made some benchmarks using Jabsimul. The performance indexes that I could evaluate are only the %CPU and MB of RAM consumed by the ejabberd program. I created 900 accounts, and populated each one with around 40 roster items of type 'both'. Then, using Jabsimul each logged user changed its presence every few seconds.

With the patches and the multicast service enabled with small rosters and small chatrooms (less than 5 contacts or paticipants), there's a small increase in CPU consumption. With medium-size rosters (40 roster items), the CPU consumption triplicates in respect to the stock ejabberd trunk version.

Obviously, I don't consider acceptable when the CPU consumption is multiplied by 3 just because all the packets use XEP33 with 40 destinations. The bottleneck is mod_multicast.

However, this result does not surprise me at all. During my GSoC coding I only cared about optimization in the patches that will be commited to ejabberd trunk: ejabberd_c2s, mod_muc_room and ejabberd_router_multicast. I didn't care about code optimizations in mod_multicast. For me it was far more important the functionality correctness. Now that mod_multicast works correctly, I can concentrate in improving it without breaking its correctness.

This planning allowed me to do all the stuff that I planned for my GSoC project, and finish the summer with correct and working code. During the last week of August I plan to profile, reorganize and improve mod_multicast to reduce its computational consumption as much as possible.


Unexpected improvement

The funny thing is that my patches to ejabberd core with a disabled multicast service reduce slightly the CPU consumption compared to the stock ejabberd trunk version. This means that there is a possible optimization in ejabberd that does not deal with XEP33 at all. If properly investigated, this improvement could be included in ejabberd trunk and benefit all ejabberd deployments, not only the ones with multicast enabled.

Saturday, 18 August 2007

Temporary Lists of Recipients - proposal for XEP33

When I started my Google Summer of Code project three months ago, Tobias Markmann pointed me to his Temporary Lists of Recipients proposal.

The purpose is to reduce even more the bandwidth consumption by sharing a common list of JIDs between the two entities which maintain a XEP33 communication.

The idea seems worth to be considered... once the current XEP33 is already implemented and deployed in the XMPP world. So I bookmark this proposal for future reference, and let's see what happens.

Multiple replyto and enforce all them in XEP33

Yesterday I was chatting about XEP33 with Elmex in the ejabberd chatroom. He point me to a strange topic in this protocol:
`There MAY be more than one replyto or replyroom on a stanza, in which case the reply stanza MUST be routed to all of the addresses.'
Here is the chatroom log.

What does that mean? If a client receives a message with extended stanza addresses, and 100 replyto or replyroom, and the user wants to answer, XEP33 forces the client to send the response to all 100 addresses. Why should we allow the sending entity to enforce the receiving entity to answer to all address, instead of giving him the power to answer only to some? In the email world is this enforcement also present?

I think this topic could be reconsidered for the next XEP33 version.

ejabberd gets XEP-0133: Service Administration

One of the minor tasks in my Google Summmer of Code project was to implement in ejabberd as many of the 31 commands described in XEP-0133: Service Administration as possible.

Aleksey Shchepin already implemented many commands in ejabberd more than 4 years ago. A year and a half ago Magnus Henoch updated them to use XEP-0050: Ad-Hoc Commands. So, I just had to update them a little to become XEP-0133 compliant:

  • 23. Send Announcement to Online Users
  • 24. Set Message of the Day
  • 25. Edit Message of the Day
  • 26. Delete Message of the Day
The commands that I implemented from scratch are:
  • 1. Add User
  • 2. Delete User
  • 5. End User Session
  • 6. Get User Password
  • 7. Change User Password
  • 9. Get User Last Login Time
  • 10. Get User Statistics
  • 13. Get Number of Registered Users
  • 15. Get Number of Online Users
  • 30. Restart Service
  • 31. Shut Down Service
Other commands are not implemented, and I didn't add them because I consider ejabberd already provides other ways more suitable:
  • 8. Get User Roster
  • 18. Get List of Registered Users
  • 20. Get List of Online Users
  • 27. Set Welcome Message
  • 28. Delete Welcome Message
  • 29. Edit Admin List
And finally, I didn't implement those commands because they use features not available in ejabberd:
  • 3. Disable User
  • 4. Re-Enable User
  • 11. Edit Blacklist
  • 12. Edit Whitelist
  • 14. Get Number of Disabled Users
  • 16. Get Number of Active Users
  • 17. Get Number of Idle Users
  • 19. Get List of Disabled Users
  • 21. Get List of Active Users
  • 22. Get List of Idle Users
During this task, I found and reported some typo errors to the author of the XEP (Peter Saint-Andre).

Finally, I tested most commands with Tkabber SVN, Psi SVN and Gajim SVN. Sergei Golovan quickly fixed a small bug in Tkabber, and now all three clients work perfectly :)

I'm quite happy with the result, so I took this screenshot that depicts the impressive list of commands that allow an administrator to configure ejabberd just with a Jabber client:


Note that the commands are nested in the Service Discovery to allow the admin to find them easier.

The patch is available here. I hope it has quality enough to enter ejabberd trunk easily, so it is published in the next major ejabberd release.

Tuesday, 14 August 2007

Almost final GSoC project status

A month ago I posted my Midterm GSoC project status, and remaining work.

Since the previous status update I completed those tasks:

The remaining tasks that I'm aware of, from now until the end of my GSoC project are:
  • Implement or update as much as possible XEP133 Service Administration in ejabberd.
  • Perform code profiling to find bottlenecks and deficiencies in mod_multicast. Improve the code.
  • Perform benchmarks to check mod_multicast's effect in CPU, RAM and traffic consumption.
  • Prepare and submit patches to ejabberd bug tracker.
  • Upload final code to Google Summer of Code hosting.
  • Wait for ejabberd code reviewers, in case I need to fix any problem in my code before commiting to ejabberd.
  • Discuss potential security and spam vulnerabilities (talk in JDEV and JADMIN mailint lists).
  • Add XEP33 support to ejabberd's Pub/Sub and/or PEP service if their codebase is stable at the time.

Monday, 13 August 2007

XEP33 implementations: separate service or embedded support?

The current version of XEP-0033: Extended Stanza Addressing says:

The IM service MAY implement multicast directly, or it MAY delegate that chore to a separate service.
Where must a Jabber entity send message and presence stanzas with XEP33 addresses, if they expect them to be routed as specified in XEP33? They must send them to a Jabber entity that advertises this feature: http://jabber.org/protocol/address.

What entities may support this feature? A Jabber server may have embedded support for XEP33, let's suppose the server JID is jabber.example.org. Or it can delegate that task to a separate service, which JID could be multicast.jabber.example.org.

How can a Jabber entity know if his local server supports XEP33? Asking disco#info to the server (which JID is jabber.example.org). However, this is not enough when the server delegates to a separate service. So, the entity should also ask the first-level services provided by the server: chatrooms.jabber.example.org, pubsub.jabber.example.org, ... and also multicast.jabber.example.org.

During my GSoC project, I implemented the server-part of XEP33 in an ejabberd module called mod_multicast. This module provides a separate service just for multicast. This means that an ejabberd server with JID jabber.example.org, with my work installed and enabled, will provide XEP33 support in a service with JID multicast.jabber.example.org.

I implemented it as a separate service service for efficiency reasons. I consider that listening in the main server JID for XEP33-enabled stanzas would need more code (well, no more than 30 lines of code) and more computations than listening in a specific JID.

This is not a big problem with message and presence stanzas since the main server JID is not expected to receive message or presence stanzas at all. But think about iq stanzas. The server receives a lot of iq requests, and sends iq replies. Remember that a XEP33 server will send iq queries, and receive replies from remote servers. I thought that using the main JID both for typical IQ tasks and also for multicasting would be a little mess. So I preferred to keep all multicasting separate in a specific JID.

As XEP33 gets more widely adopted, maybe it makes sense to move all the XEP33 code from mod_multicast to an internal core file, and serve it embedded instead of a separate service. But right now, I think the current solution is clean, efficient, and respects the protocol.

What about clients, and remote servers? Obviously, it isn't efficient to query all the first-level items in the server just to know if one of them supports XEP33. It would be faster to just ask the server. This translates in three aspects: more code, more CPU consumption and more bandwidth consumption.

However, they are not much a problem. Probably 20 or 30 lines of code are enough to program the loop to check all server items. And this check is done only the very first time a server queries other server. Once a server/client knows that jabber.example.org supports XEP33 in multicast.jabber.example.org, this knowledge is stored in cache. When the cache item is obsolete (maybe in 12 or 24 hours), there is no need to perform another full disco traversal! The client only needs to revalidate the cache item, asking features directly to multicast.jabber.example.org.

I'm aware of only three programs that implement XEP33, or a part of it:
  • Openfire server has basic support of XEP33. It provides the feature embedded. It only queries the server, not the services.
  • Psi client has very basic support for sending XEP33 message stanzas. It only queries the server, not the services.
  • Tkabber client has very basic support for showing extended information included in XEP33 message stanzas. Since it does not send XEP33 stanzas, it does not need to query for XEP33 support.
This means that ejabberd's mod_multicast can send to Openfire. But Openfire and Psi can't send to ejabberd because they are unaware that the ejabberd server has XEP33 support in a separate service. Note that all three programs implement XEP33 correctly. And even then, they are incompatible in practice.

Yesterday I chatted about this issue with Gaston Dombiak (Gato from Openfire) and Kevin Smith (Kev from Psi). They are interested in implementing the rest of the XEP, including the part that I explained previously. Of course, this interest is conditioned to the success of the protocol: it must be implemented also by other software, and be widely used.

So, once a new and updated version of XEP33 is published with the improvements that I proposed to Peter Saint-Andre, I'll file a bug report in the Psi and Openfire bug trackers.

Until then, I still need to do some cleaning and profiling in mod_multicast.

Friday, 10 August 2007

Summary of XEP33 addresses limits

This post summarizes and updates all what I have said in the past weeks in those posts: The limit of addresses in XEP33 must be fixed, XEP33: types of limits and default values, XEP33: Tell limits in disco#info response using XEP128, and Updates to XEP33 limits proposal.


Introduce the problematic

Let's suppose that limiting the number of destination addresses in a XEP33 stanza really serves a purpose, for example, to prevent or reduce abuse of the multicast service. To count how many 'addresses' are there in a stanza, only TO, CC and BCC addresses are considered, since those are the ones that will generate traffic consumption.

XEP33 says that a server should have a limit for the maximum number of addresses allowed on a single packet: the limit SHOULD be more than 20 and less than 100.

That limit is easy to implement on the receiving party. But what happens with the sender? How many addresses can a sender put on each packet? If it puts too many, the packet will be rejected. If it puts too few, it is not profiting of XEP33 as much as it could do.

On the current version of XEP33, remote servers allow as few as 20 and as much as 100 addresses. This means that a sender have to reduce to the minimum common in order to not get rejects: it can only send as much as 20 addresses on each packet.

If we already know that 20 is the maximum limit in practice, then why bother telling some admins that they can put 30, 40 or more on their servers? Nobody will send more than 20 addresses on each packet!


Proposed solution: configurable limits, and method to inform

Allow configurable limits on the protocol for each different condition. Define default values in the protocol. And describe a method for senders to know which limits are applied on each destination server.

Another possible limitations to reduce abuse of a multicast service are # of messages per minute, # of addresses per minute, # of total bytes sent... But I don't expect them to be interesting for inclusion in XEP33.


Types of limits

Several limits can be defined, depending in the different characteristics of a XEP33 stanza:

  • sender is: local or remote
  • the stanza type is: message or presence. Note that iq stanzas don't directly include XEP33 addresses.
There is no way to know if a XEP33 was sent by a user or a server/service. So, that categorization is not possible.

Those categories do not allow to differentiate the stanzas sent by a trusted local service (like MUC or Pub/Sub components) from the rest of possible senders. Obviously, the trusted local services operated by the same administrator that installed the multicast service should have unrestricted access to the multicast service. This possibility is an implementation specific issue which will not be covered by XEP33.

The mentioned stanza characteristics allow to define 4 different limits:
  • local message
  • local presence
  • remote message
  • remote presence
The allowed values for the limits are:
  • Positive integers, including zero: 0, 1, 2, ...
  • the key word 'infinite', which means that the limit is not applied at all.

Method to inform

This method uses XEP-0128: Service Discovery Extensions, as proposed by Ralphm in a comment.

How does this work? Currently, when an entity wants to send a XEP33 stanza, it first checks if there is a XEP33-enabled service available. To check that, it queries disco#info to the service, and looks for in the response.

If there are limits to inform about, the disco#info response does not only announce XEP33 support, but also announce which are the exact limits in effect in the service.

When a multicast service announces limits in a disco#info response, it SHOULD only report limits which are configured to a different value than the one defined as default in XEP33. So, if XEP33 says that a given limit is 20; but the limit in effect in a server is 30, then the server must tell the limit. If the limit in effect is the default value, then it SHOULD NOT be specified at all in disco#info to save bandwidth.

Similarly, when a multicast service announces limits in a disco#info response, it SHOULD only report limits which are going to be applied to the entity that performs the request. The reason is that users of the local server and users/servers/services which are remote will have different limits, and it's a waste of bandwidth to announce limits to an entity that will never be affected by them.

The entity that requested this info must cache those limits for posterior reference.

Let's see an example. The Jabber server capulet.com wants to send a stanza with XEP33 addresses to the Jabber server shakespeare.lit. The response announces XEP33 support, and also provides information of several limits:

<iq type='get'
from='capulet.com'
to='shakespeare.lit'
id='disco1'>
<query xmlns='http://jabber.org/protocol/disco#info'/>
</iq>

<iq type='result'
from='shakespeare.lit'
to='capulet.com'
id='disco1'>
<query xmlns='http://jabber.org/protocol/disco#info'>
<identity
category='server'
type='im'
name='shakespeare.lit jabber server'/>
...
<feature var='http://jabber.org/protocol/address'/>
<x xmlns='jabber:x:data' type='result'>
<field var='FORM_TYPE' type='hidden'>
<value>http://jabber.org/protocol/address</value>
</field>
<field var='message'>
<value>20</value>
</field>
<field var='presence'>
<value>infinite</value>
</field>
</x>
...
</query>
</iq>


Apply limits to incoming stanzas

When a stanza is received by a XEP33-enabled entity to be routed to other destinations, the number of destination addresses is compared to the limit which is in effect for that kind of stanza. If the stanza has more addresses of type TO, CC and BCC than the allowed, an error message is returned to the original sender.


Take into account limits when sending stanzas

When any Jabber entity is about to send a XEP33 stanza, it MUST make sure the number of destination addresses is not greater than the limit reported by the destination entity. In this case, the destinations can be split in several groups (or batches).

Tuesday, 7 August 2007

On travel for the next 3 days

This is just to inform that I'll be 'completely away from keyboard' for the next three days. The expected return date is in the evening of 9 August GMT.

Don't worry about the progress in my GSoC project: I'll carry my hand-written design diagrams of next mod_multicast code I'll write, blank papers, a black pen, a blue pen, and Programming Erlang, Software for a Concurrent World.

Updates to XEP33 limits proposal

I previously proposed some limits for number of addresses and how to tell them in disco#info response using XEP128.

All this needs some modifications, which I explain now.

1. To count how many 'addresses' are there in a stanza, only TO, CC and BCC addresses are considered, since those are the ones that will generate traffic consumption.

2. When a multicast service announces limits in a disco#info response, it SHOULD only report limits which are configured to a different value than the one defined as default in XEP33. So, if XEP33 says that a given limit is 20; but the limit in effect in a server is 30, then the server must tell the limit. If the limit in effect is the default value, then it SHOULD NOT be specified at all in disco#info to save bandwidth.

3. Similarly, when a multicast service announces limits in a disco#info response, it SHOULD only report limits which are going to be applied to the entity that performs the request. The reason is that users of the local server and users/servers/services which are remote will have different limits, and it's a waste of bandwidth to announce limits to an entity that will never be affected by them.

4. The limits that are worth considering can't be categorized in 'user' or 'server', since the multicast service does not have an easy way to know if a stanza was generated by a user or a server. So, the characteristics of a XEP33 stanza that can be used to differentiate them, and apply fine-grained limitations are:

  • sender is: local or remote
  • the stanza type is: message or presence
Those categories do not allow to differentiate the stanzas sent by a trusted local service (like MUC or Pub/Sub components) from the rest of possible senders. Obviously, the trusted local services operated by the same administrator that installed the multicast service should have unrestricted access to the multicast service. This possibility is an implementation specific issue which will not be covered by XEP33.

Monday, 6 August 2007

GSoC status update: collateral tasks

During the last week I haven't dedicated time to code in my GSoC project. Instead, I focused in other stuff not directly related, but that I consider important too.

I summarized my proposed changes to XEP33 in the XEP33 wiki page, and pinged Stpeter to take a look.

I participated in the discussions in the ejabberd mailing list about ejabberd project management, release cycle, bug tracker, etc. I hope in the next weeks there will appear documents that describe ejabberd project management, how to submit patches...

I also started to learn basic Emacs usage (it took me a full day to customize it to my needs). I'm a Vim guy, and I find it better suited for programming, but now I'll use Emacs for SVN tasks. Emacs helps with ChangeLog writing; psvn.el helps with SVN; and erlang-mode provides a standard code indentation system, among other things.

This week was not completely lost, after all. In fact, GSoC is not only to just 'produce code', but also to learn. And I learned a lot this week.

Ahh! I also started to practice car driving, for the first time in my life. There isn't a particular reason to learn now and not before. Well, maybe I thought: if I started learning Emacs, why not car driving? Self-learning rules.

Now it's time for GSoC coding. I'm designing, coding and testing XEP33 addresses limits in ejabberd's mod_multicast.

Monday, 30 July 2007

XEP33: Tell limits in disco#info response using XEP128

Some time ago I explained why The limit of addresses in XEP33 must be fixed. Of the proposed solutions, I prefer #3: 'Configurable limit, and method to inform'.

A method to inform is using XEP-0128: Service Discovery Extensions, as proposed by Ralphm in a comment.

How does this work? Currently, when an entity wants to send a XEP33 stanza, it first checks if there is a XEP33-enabled service available. To check that, it queries disco#info to the service, and looks for in the response.

If we also use XEP128, the disco#info response will not only announce XEP33 support, but also announce which are the exact limits in effect in the service.

Let's see an example. The Jabber server capulet.com wants to send a stanza with XEP33 addresses to the Jabber server shakespeare.lit. The response announces XEP33 support, and also provides information of several limits:

<iq type='get'
from='capulet.com'
to='shakespeare.lit'
id='disco1'>
<query xmlns='http://jabber.org/protocol/disco#info'/>
</iq>

<iq type='result'
from='shakespeare.lit'
to='capulet.com'
id='disco1'>
<query xmlns='http://jabber.org/protocol/disco#info'>
<identity
category='server'
type='im'
name='shakespeare.lit jabber server'/>
...
<feature var='http://jabber.org/protocol/address'/>
<x xmlns='jabber:x:data' type='result'>
<field var='FORM_TYPE' type='hidden'>
<value>http://jabber.org/protocol/address</value>
</field>
<field var='limit-remote-user'>
<value>20</value>
</field>
<field var='limit-remote-service'>
<value>100</value>
</field>
<field var='limit-local-user'>
<value>300</value>
</field>
</x>
...
</query>
</iq>

Saturday, 28 July 2007

XEP33: types of limits and default values

I've talked in the past that, if we want to limit the number of destination addresses in a XEP33 stanza, we must fix the XEP. One improvement I propose is to tell the limits in disco#info response using XEP128.

Let's suppose that limiting the number of destination addresses in a XEP33 stanza really serves a purpose, for example, to prevent or reduce abuse of the multicast service. If you accept this supposition, then you probably want fine-grained limits, so you can define different limits depending in the sending entity, type of stanza...

The different characteristics of a XEP33 stanza are:

  • sender is: local or remote
  • sender is: user or server/service/component
  • the stanza type is: message or presence. Note that iq stanzas don't directly include XEP33 addresses.
There are eight possible permutations, and I associate a limit to each one. Let's review them in detail:
  • limit-remote-user-message: A spammer can create accounts in a remote friendly server, and send XEP33 stanzas to our multicast service. I think that if a user wants to use a multicast service, he should ask the administrator of his server to install it, instead of using our multicast service. So, this limit should be 0 by default, which means users of remote servers can't send XEP33 stanzas directly to us.
  • limit-remote-user-presence: Same as above, default: 0.
  • limit-remote-server-message: There are many reasons for a remote server to send us a message stanza with XEP33 addresses: MUC message, pubsub message, user message... I propose a default limit of 20.
  • limit-remote-server-presence: A remote server sends presence stanzas when a user logins, logouts or changes presence.
    I consider that presence stanzas are not as annoying as message or iq stanzas. I propose a default limit of 100.
  • limit-local-user-message: In a publicly accesible Jabber server with unrestricted account registration (such as jabber.org), spammers can create accounts in jabber.org and use the local multicast service to send spam messages both to local and remote users/servers. I propose a default value of 20.
  • limit-local-user-presence: I don't see any reason for a user to send a presence stanza to several destinations. Same as with remote servers, I propose a default limit of 100.
  • limit-local-server-message: Obviously, this limit should be infinite always, since a multicast service is expected to trust a local server.
  • limit-local-server-presence: Same as above, default: infinite.
Summarizing I propose those limits::
  • zero: remote-user-*
  • infinite: local-server-*
  • variable: remote-server-* and local-user-*
Once XEP33 defines exact default values, if the limits in a XEP33 deployment are the default values, those limits SHOULD NOT be reported in disco#info responses.


PD: Another possible limitations to reduce abuse of a multicast service are # of messages per minute, # of addresses per minute, # of total bytes sent... But I don't expect them to be interesting for inclusion in XEP33.

Tuesday, 17 July 2007

Midterm GSoC project status, and remaining work

The Google Summer of Code program is half way to the end. This is a summary of the accomplished work until now:

  • Implemented or updated several small XEPs in ejabberd: Contact Addresses, Delayed Delivery... Patches are awaiting review and integration in ejabberd.
  • Implemented XEP33 in a Jabber component for ejabberd. All the code lives in mod_multicast.erl. Code is currently in ejabberd-modules SVN.
  • Describe Sever Active Multicast
  • Describe how to implement XEP33 in the server's C2S.
  • Added XEP33 support to ejabberd core (sending presence updates). Code in ejabberd-modules SVN.
  • Describe how to implement XEP33 in a MUC service
  • Added XEP33 support to ejabberd's mod_muc. Code in ejabberd-modules SVN.
  • Alpha-testing the code with small examples. Fix any bug, improve any potential drawback in the existing code.
The remaining tasks that I'm aware of, from now until the end of my GSoC project are:
  • Discuss improvements in the current XEP33 definition of limits.
  • Implement the improvements for limits.
  • Discuss potential security and spam vulnerabilities, and how to prevent them.
  • Propose improvements for XEP33. Most of the text can be reused from my previous blog posts.
  • Perform benchmarks to check mod_multicast's effect in CPU, RAM and traffic consumption.
  • Test compatibility with other XEP33 existing implementations.
  • Add XEP33 support to mod_pubsub and/or mod_pep if their codebase is stable at the time
  • Write documentation for ejabberd Guide.
  • Implement or update ejabberd's XEP133 Service Administration
  • Wait for ejabberd code reviewers, in case I need to fix any problem in my code before commiting to ejabberd.
I weekly update the project timeline. Considering my speed at working, and the difficulty of the remaining tasks, I consider my project to be on track, and will be completed on time.

mod_multicast - Big rewrite: replaced loop with pool

During the past days I concentrated in improving a critical part of mod_multicast: the code that checks for protocol support on remote servers.

Originally all the processing for a user stanza was done in a single, large, procedural run. This included a loop that slowly checks XEP33 support for each remote server, sends the stanza accordingly, and then proceeds to the next server.

Now, the loop only sends the stanzas to the servers which support is already known. For the unknown servers, it only sends the iq:query disco#info request, but does not wait for an answer. Instead, the group of destinations related to that server (which are now considered 'Waiters') are temporarily stored in a 'Pool'. Eventually, a server answers, or an error is received; mod_multicast reads from the Pool the Waiters group and finally sends the stanza.

Future work: I must fix new bugs, related to the new code. Later I'll reconsider if the current Pool needs further changes. And finally, I can go back to the XEP33 problem with the addresses limits.

Monday, 9 July 2007

The limit of addresses in XEP33 must be fixed

Problem

XEP33 says that a server should have a limit for the maximum number of addresses allowed on a single packet: the limit SHOULD be more than 20 and less than 100.

That limit is easy to implement on the receiving party. But what happens with the sender? How many addresses can a sender put on each packet? If it puts too many, the packet will be rejected. If it puts too few, it is not profiting of XEP33 as much as it could do.

On the current version of XEP33, remote servers allow as few as 20 and as much as 100 addresses. This means that a sender have to reduce to the minimum common in order to not get rejects: it can only send as much as 20 addresses on each packet.

If we already know that 20 is the maximum limit in practice, then why bother telling some admins that they can put 30, 40 or more on their servers? Nobody will send more than 20 addresses on each packet!

Proposed solutions

I can see three solutions to the current situation:

  1. Unlimited addresses
  2. Strict limit
  3. Configurable limit, and method to inform
Since each one has pros and cons, let's see them in more detail.

#1: Unlimited addresses

Don't force any limit at all: XEP33 servers must accept a packet with as many addresses as the sender desires. The limit in this case is not imposed by XEP33, but by the server implementation, which usually limits the size of any XMPP packet.

This solution is the easier of all them to implement.

This solution can be potentially damaging. For example ejabberd limits the size of any received packet to 64KB. A spammer could construct stanzas with 4KB of message, and 60KB of destination addresses.
In general, the size of an 'address' element is around 50 bytes.
In practice, this allows a spammer to send one packet to a Jabber server that implements XEP33, and the server itself will send 4KB of spam to 1200 local accounts. And all this damage with just a single XMPP stanza.

#2: Strict limit

Define a strict limit on XEP33, and all senders and receivers must agree. For example 100. This way all Jabber servers, components and clients that implement XEP33 know how much addresses can be send on a packet.

This solution is easy to understand and to implement.

However, it limits the power of XEP33 on certain situations. For example, on a restricted and controlled private network where spammers are not an issue, it may be preferable to allow up to 1000 addresses.

#3: Configurable limit, and method to inform

Allow a configurable limit on the protocol, for example between 20 and 100, like currently. And describe a method for senders to know which limit is applied on each destination server.

This solution is the hardest to implement: it makes XEP33 more complex, and requires more code on both senders and receivers. The benefit of this solution is that it allows XEP33 to adapt to different network conditions.

Some topics that must be addressed if this solution is used:
  • The disco#info response that reports XEP33 support could indicate the limit:
    <feature var='http://jabber.org/protocol/address' limit=50 />
  • If a stanza is sent with more than the allowed addresses, the resulting error stanza informs of the limit:
    <not-acceptable limit=50 />
Temporary solution in ejabberd

Until a definitive solution is adopted for this problem, I implemented in ejabberd the easiest possible solution ever: never send a packet with more than 20 addresses. If this limit is reached, simply send several packets with smaller list of destination addresses.

Tuesday, 3 July 2007

ejabberd's C2S with XEP33 support

I have improved ejabberd's C2S code to use XEP-0033: Extended Stanza Addressing when the users login, logout or update presence.

This means that, when a user sends initial presence to login, or logout, or updates his presence, the server instead of sending a single packet to each contact on his roster, tries to send a packet to each destination server.

As usual, I've commited the change to ejabberd-modules SVN and the changes can be seen here: ejabberd_c2s.erl.

As previously with the MUC service, I followed the Guidelines for Server Active Multicasting on XEP33.

Tuesday, 26 June 2007

ejabberd's mod_muc with XEP33 support

I have improved mod_muc (ejabberd implementation of XEP-0045: Multi-User Chat) to use XEP-0033: Extended Stanza Addressing in certain situations. This is still preliminary code, and includes some ugly code. But it seems to work.

Of course I followed the Guidelines for Server Active Multicasting on XEP33.

As previously mentioned on Multi User Chat and Extended Stanza Addressing, improving room presence broadcasts to use XEP-33 will be a little difficult. So, for now I concentrated on the easiest and potentially more beneficial improvements: message broadcasts.

Multi User Chat and Extended Stanza Addressing

A Multi-User Chat service sends a lot of similar messages to a lot of JIDs, so it would seem this service can bastly benefit if using Extended Stanza Addressing.

We can group the possible targets in:

  • Sending message broadcasts (section 7.9 of XEP-45)
  • Sending system advisories: change on subject, change on room configuration, system messages (sections 8.1, 10.9)
  • Sending presence broadcasts: join, leave, nick change, granting/removing privileges... (sections 7.1.3, 7.2, 7.3, 7.4, 8.2, 8.3, 8.4, 8.5, 9.1, 9.2, 9.3...)
Upgrading a MUC service to send message broadcasts and system message broadcasts using a multicast service (XEP-33) is easy, since the exact same message stanza is sent to all the room occupants.

However, presence broadcasts are slightly different depending on the room configuration and the destination role and affiliation: on some cases the presence stanza includes additional attributes (jid). So, upgrading a MUC service to send presence broadcasts using a multicast service is slightly more complicated.

Monday, 25 June 2007

Guidelines for Server Active Multicasting on XEP33

Jabber servers and components sometimes send the same stanza to several destinations. Using XEP33 on such situations they could save traffic. I'll now provide some guidelines to implement that feature.

Guidelines

1. If there's a local multicast service, the sender SHOULD send a single stanza with an 'addresses' element as described on XEP33. If no local multicast is available, obviously it MUST send multiple stanzas as usual.

2. Each desired destination address must be included as an 'address' element, as a child of the 'addresses' element, as described on XEP33.

3. All the 'address' elements MUST include the attribute 'jid'. It is not allowed to use the attribute 'uri'.

4. All the 'address' elements MUST include the attribute 'type' with value 'bcc'. It is not allowed to put any other value on this attribute.

5. All the 'address' elements SHOULD NOT include any other attribute, since they will never be presented to the destination entity.


Example

On this example I'll show how a MUC service that follows the proposed guidelines sends a message to all the occupants of a room.

Scenario

On this example, I suppose those JIDs, which are described on
XEP45 sec. 4.3:

  • the Jabber server: shakespeare.lit
  • the multicast service: multicast.shakespeare.lit
  • the MUC service: macbeth.shakespeare.lit
  • the room: darkcave@macbeth.shakespeare.lit
  • the room occupants:
    • crone1@shakespeare.lit/desktop
    • wiccarocks@shakespeare.lit/laptop
    • hag66@shakespeare.lit/pda
Without XEP33

The section 7.9 Sending a Message to All Occupants on XEP45 shows a room occupant that sends a message to the room. Then, the MUC service sends this message to all the room occupants.

The example 60 shows the three stanzas sent by the MUC service to the room occupants:
<message
from='darkcave@macbeth.shakespeare.lit/thirdwitch'
to='crone1@shakespeare.lit/desktop'
type='groupchat'>
<body>Harpier cries: 'tis time, 'tis time.</body>
</message>

<message
from='darkcave@macbeth.shakespeare.lit/thirdwitch'
to='wiccarocks@shakespeare.lit/laptop'
type='groupchat'>
<body>Harpier cries: 'tis time, 'tis time.</body>
</message>

<message
from='darkcave@macbeth.shakespeare.lit/thirdwitch'
to='hag66@shakespeare.lit/pda'
type='groupchat'>
<body>Harpier cries: 'tis time, 'tis time.</body>
</message>


With XEP33

If the MUC service supports XEP33, instead of sending three similar stanzas it sends only one to the multicast service:

<message
from='darkcave@macbeth.shakespeare.lit/thirdwitch'
to='multicast.shakespeare.lit'
type='groupchat'>
<body>Harpier cries: 'tis time, 'tis time.</body>
<addresses xmlns='http://jabber.org/protocol/address'>
<address type='bcc' jid='crone1@shakespeare.lit/desktop'/>
<address type='bcc' jid='wiccarocks@shakespeare.lit/laptop'/>
<address type='bcc' jid='hag66@shakespeare.lit/pda'/>
</addresses>
</message>

Sunday, 24 June 2007

XEP33 and Server Active Multicasting

XEP-0033: Extended Stanza Addressing specifies how Jabber entities can define multiple destination addresses on a XMPP stanza.

This is interesting when a user wants the destinations to know that he sent the same message to all the other destinations (Carbon Copy), or to send a copy to a supervising user (Blind Carbon Copy). The current version of XEP33 explains this basic usage, including an example.

XEP33 can also be used by Jabber servers, MUC services... to reduce bandwidth consumption. This possibility is hinted on the introduction of XEP33, but is not further described on the content of the document. I call this usage Server Active Multicasting, since the server (or a server component) is actively determined to use multicasting, and instead of sending individual stanzas to each destination, tries to send stanzas to groups of users.

There are many opportunities on the server side to use XEP33 to reduce traffic, for example:

  • On the Jabber/XMPP server: to send presence changes
  • MUC service: to send chatroom messages, subject changes, presence changes, join, leaves, nick changes...
  • Pubsub/PEP service
  • Subscription/group list service
  • RSS transport
As part of my GSoC project, my next steps are to define some guidelines to implement
Server Active Multicasting. And of course implement this on ejabberd.

Tuesday, 19 June 2007

mod_multicast: completed XEP33 support

During the past week I fixed several small problems on the source code of mod_multicast:

  • Periodically remove from the cache those items that are really old.
  • If the packet has wrong xmlns, no 'addresses' or no 'address', it now rejects to process the packet and reports an error to the sender.
  • Fixed reception of IQ Query packets. This topic deserves a blog post, even a documentation section.
  • Added support for XEP-0092: Software Version. Currently it shows the SVN revision.
  • I verified that mod_multicast does not interfere with Privacy Lists. I initially thought that problem could appear when sending a multicast packet internally on ejabberd. Actually, everything seems to work correctly.
So now I consider the implementation of mod_multicast to be completed. At least it supports XEP-33 v1.1.

The next steps are: active server multicast, and abuse/spam prevention.

Saturday, 16 June 2007

Packet Relay? No, thanks

During my development of a multicast service for ejabberd that implements XEP33: Extended Stanza Addressing, I found a phrase that catch my attention:

The server MAY choose to limit whether non-local servers can send address headers that require the local server to send to third parties (relaying).
Packet relay: server A wants to send a packet to server B. Instead of simply doing that, he sends a packet to server C requesting him to resend it (relay it) to server B.

Why should a public Jabber server accept orders from another Jabber server to send packets to a third Jabber server?

This feature may or may not be useful on open networks (like the XMPP Federation), and may or may not on private networks. But I think that packet relay does not solve any critical problem right now, and instead it brings many new undesired problems.

Just to name one problem related to packet relay: abuse to send spam. You just need to see what happened to email relay.

I've removed all support for packet relay on my implementation for ejabberd. I proposed to describe packet relay support as an optional feature on XEP33, and discourage its usage on open networks.

If a server developer wants to implement it, ok, implement it. If a Jabber server admin needs that feature for some reason on his private network, ok, enable it. But I think there are more important features that I must implement during this GSoC project than packet relay.

Please post a comment if you would like to have packet relay support on ejabberd's multicast service. If so, explain why you find that feature useful.

Related links: MUC logs.

Saturday, 9 June 2007

Extended Stanza Addressing: initial commit to SVN

I've sent to ejabberd-modules SVN the initial commit of mod_multicast, the module for ejabberd that implements Extended Stanza Addressing (XEP-33).

It's an independent component. It only acts passively: when a user or server sends a packet, it checks and routes.

This version implements almost all the tasks described on the protocol: wait for packets, query other servers and components for support, cache responses, verify age of cache responses, check packet syntax and route to final destinations, either local or remote.

I think the only missing feature is to clean the cache every X time. After that, I want to verify the access control works correctly both for local users and remote users/servers.

Implementing XEP33 is not the end of my project. It seems that's only the beginning: that XEP needs some parts to be rewritten, and several aspects to be added. For example, error reporting; how exactly to use with MUC and Pub-Sub.

There isn't yet a public dummy server where people can try the implementation. For now, you can get the code from SVN, compile, install on your ejabberd server and try. Please don't use this on a production server! Tobias Markmann has already commented that will try it. /me crosses fingers.

Sunday, 27 May 2007

Returned from Canada

I've returned from my travel. It was nice. Almost a perfect travel: no baggage lost, no plane crash on a lonely island...

I even got a surprise on the conference where I presented my paper: I received a 'Nokia Student Travel Award'!

Thursday, 17 May 2007

PhD travel since 18 to 26 May

I'll be on travel since 18 to 26, May 2007.

This was already planned on my GSoC project timeline. Work directly related on XEP-33 will start on 28 May, as planned.

Ah, I forgot to mention. I'll be on Toronto, Canada.

Completed XEP-0157 and XEP-0203

I've published the first results of my GSoC project.

The task consisted on adding or updating support for two protocols on ejabberd:
XEP-0157: Contact Addresses for XMPP Services
XEP-0203: Delayed Delivery

The objective of this task was to get some experience reading XEPs. The coding part was easy, since both protocols were rather small and easy to implement. So, I decided to update also some related code.

More info and patch downloads: XEP-0157 and XEP-0203.

The plan is to commit those patches to ejabberd SVN soon, even before GSoC finishes. I hope there aren't bugs. Otherwise, please comment it here or on the bug tracker.

Wednesday, 9 May 2007

Project Timeline

The project timeline is grouped in weeks, which start on Monday. Each week has a clear and independent task, which must be finished before the end of the week. Some task description are a little vague yet. I'll update this timeline as work progresses.

April 30 - Completed
Read the protocols, understand and clarify all the initial doubts. Will also serve to get contact with my mentor, the XEP author, the standard mailing list and maybe other implementors.

May 7 - Completed
Smaller XEPs: check which parts of them are already implemented, and whether they require an update.

May 14 - Completed
Start coding: Implement Contact Addresses and Delayed Delivery.

May 21 - Completed
PhD travel

May 28 - Completed
Initial design of the XEP-33 component.

Juny 4 - Completed
Start implementation of XEP-33. Publish first version.

Juny 11 - Completed
Test code for bugs. Discuss error handling, relay support. Update code accordingly if only minor changes are required.

Juny 18 - Completed: SAM, guidelines for SAM, MUC +SAM ideas
Design guidelines for active server multicasting, and try to implement on mod_muc for testing purposes.

Juny 25 - Completed: mod_muc and C2S
Add XEP-33 support to mod_muc.
Add XEP-33 support to ejabberd core (sending presence updates).

July 2 - Completed
Review for bugs, compliance. Try to fix code ugliness.

July 9 - Completed: loop -> pool
Continue with review of existing code.

July 16 - Completed
Propose addresses limits in XEP33:
The limit of addresses in XEP33 must be fixed

July 23 - Completed
XEP33: types of limits and default values
XEP33: Tell limits in disco#info response using XEP128

July 30 - Completed
Learn to use Emacs, psvn and ChangeLog.

August 6 - Completed
Implement limits in addresses.
Updates to XEP33 limits proposal
Write documentation for the ejabberd guide.
Test compatibility with other software: Openfire, Tkabber, Psi,.

August 13 - On progress
Implement as much as possible of XEP-0133 Service Administration.
Code profiling to find bottlenecks and deficiencies in mod_multicast.
XEP33 protocol update (chat with Stpeter)

August 20 -
Perform benchmarks (chat with Mremond).
Upload final code to Google Summer of Code hosting.
Prepare and submit patches to ejabberd bug tracker.

Tasks that will be addressed at a later time
Add XEP-33 support to ejabberd's Pub/Sub service (talk with Aleksey and Legoscia).
Discuss potential security and spam vulnerabilities (talk in JDEV and JADMIN mailint lists).

Friday, 20 April 2007

¡Propuesta aceptada en GSoC 07!

La propuesta de proyecto que envié al Google Summer of Code 2007 ha sido aceptada.

Eso significa que este verano (desde junio hasta agosto incluidos) trabajaré en dicho proyecto y seré convenientemente remunerado. O sea, que me pagan por trabajar: 4500 USD, al cambio unos 3300€, antes de impuestos.

El proyecto consiste principalmente en implementar Extended Stanza Addressing en Ejabberd. Lo primero es un protocolo que permite enviar el mismo 'stanza' a mucha gente y lo segundo es un servidor de mensajería instantánea Jabber/XMPP.

Tuesday, 17 April 2007

What we expect from a GSoC student

I found some recommendations for GSoC developers, which may seem obvious, but are quite useful:

What we expect

* CODE:
- Clean (see CodingStyle).
- Working (try to do small changes, step by step, returning to a working state as often as possible).
- Tested (setup a local test wiki, write unit tests, ...).

* DOCS:
- For developers (e.g. docstrings)
- For users (where appropriate - e.g. as CHANGES entries or Help* wiki pages).

* Regular communication:
- Stay online on #moin-dev.
- Talk about your plans and what you do.
- Ask for help if you are blocked.

* Regular work:
- Citing Google: "your main activity for the summer".
- There must be at least 1 push to your public repo for each day you worked on your project (try to do clean commits, 1 commit per feature / per sub task).

Sunday, 15 April 2007

Project Description

Title: Implement XEP-33 Extended Stanza Addressing and other XEPs on ejabberd

Organization: XMPP Standards Foundation

Student: Bernardo Antonio de la Ossa Pérez

Mentor: Mickaël Rémond

Abstract:

I propose to implement 'XEP-0033 Extended Stanza Addressing' as a component on ejabberd. This protocol aims to reduce traffic between Jabber servers when both of them support the protocol, and the clients send broadcast messages. ejabberd's MUC and PubSub components will be updated to use this protocol.

This protocol is already used on several Jabber servers and clients. Implementing it on ejabberd is an important step on the adoption process.

To allow me to get some experience on XEP-reading and implementation, I also propose to implement (or update) three smaller protocols on ejabberd, namely:
XEP-0133: Service Administration
XEP-0157: Contact Addresses for XMPP Services
XEP-0203: Delayed Delivery

With this project, ejabberd will provide more features and the existing features will be better protocol-compliant. Jabber as a whole will benefit with ejabberd support of extended stanza addressing. And finally, I'll get valuable experience on protocol implementation and knowledge on XMPP internals.

Detailed description:

The main purpose of this proposal is to write a module for ejabberd that implements 'XEP-0033 Extended Stanza Addressing' as a component. This XEP can reduce traffic on certain situations, as it allows Jabber entities to specify several recipients on a single XMPP message. Instead of sending a stanza for each user, a single stanza can be sent specifying all the destination addresses.

The MUC and PubSub components included on ejabberd will be updated to support Extended Stanza Addressing. Optimization of bandwidth for MUC is one of the main arguments used against MUC usage over IRC. Hence, this proposal will help spread MUC and thus XMPP use.

This extension requires implementation both on client and servers. Some Open Source programs already implement it: OpenFire server and Psi 0.11 client. Several big public and federated Jabber servers use ejabberd (including jabber.org and jabber.ru). Hence, I expect the addition of this XEP to ejabberd will provide a noticeable reduction on general bandwidth consumption and will incentive other client and servers developers to support it.

By using Epeios, this ejabberd component could be ran with any Jabber server, for example jabberd14 or jabberd2.

The second part of my proposal is to finish the implementation of three more XEPs, which are already partially implemented, but based on early versions of those XEPs, or even before the related XEPs were written.

The protocols I've selected for this part are:
XEP-0133: Service Administration
XEP-0157: Contact Addresses for XMPP Services
XEP-0203: Delayed Delivery
I'll now comment them in more detail.

ejabberd already implements some service administration commands. However, they are direct migration from a very early administration protocol written by Alexey Shchepin back in 2003. My task is to verify that the current implementation adheres to the protocol, and add the remaining commands. On the client part, this protocol requires XEP-04 data forms and XEP-50 ad-hoc commands, which are already implemented on several Jabber clients, including Psi 0.11, Gajim, and Tkabber.

Regarding 'XEP-0157: Contact Addresses for XMPP Services', this is a very small protocol. I'll use it to get some experience in XEP-reading before I focus on the larger parts of my proposal.

Finally, ejabberd is said to implement 'XEP-0091: Delayed Delivery'. My task on this respect is to update the current implementation to XEP-0203. This should be the next step on my learning phase.

Benefits to Community:

My proposal consists on implementing or updating some useful protocols on ejabberd, a widely deployed and used Jabber server. This proposal helps to reduce the bandwidth consumption both on the local Jabber server and the remote one. Its integration into the MUC component will benefit it over IRC.

Additionally, this component will be pluggable to other Jabber servers like jabberd14 and jabberd2.

Deliverables:

During April and May: 'Read, understand and design'
- Read the protocols, understand and clarify all the initial doubts. Will also serve to get contact with my mentor, the XEP author, the standard mailing list and maybe other implementors.
- Initial design of the XEP-33 component.
- Regarding the other smaller XEPs: check which parts of them are already implemented, and whether they require an update.
At the end of this preliminary phase I have perfectly understood the four protocols, I know what needs to be done, and how to do it.

First week of June: 'Implement Contact Addresses and Delayed Delivery'
At the end of the week I have finished implementing Contact Addresses, and updated ejabberd implementation of Delayed Delivery. The code is tested and marked as finished.

From second week of June until second week of July: 'Implement Extended Stanza Addressing'
Implement the full component. Test it with other implementations. At the end of this phase, the component should provide full support for the protocol, and be completely debugged and stable.

Third and fourth weeks of July: 'Give additional features'
Once the component is stable, I'll update ejabberd modules mod_muc and mod_pubsub to use XEP-33 whenever possible.
At the end of this phase, the implementation of XEP-33 is finished.

First and second week of August: 'Implement Service Administration'
First, I'll update the existing commands to the protocol. Once all the existing code is acceptable, implement the remaining commands. At the end of this phase the service administration module is marked as finished.

Third week of August:
If I carry all the previous phases on time, I have a full week to write developer documents. In them I'd describe the API, and how to use it of the ejabberd ad-hoc commands and data forms implementation.

When Summer of Code ends, I hope my work is reviewed and included on mainstream ejabberd. Of course, I'll be available to fix any bugs that may be found on the future.

Open Source experience:

A summary of my involvement in Jabber projects:

Cofounder of the spanish Jabber site jabberes.org, coadministrator of the Jabber server, Drupal administrator, content writer (since Sep 2003)

Involved in Tkabber since Dec 2003: Drupal administrator, tutorial writer, bug reporter, spanish translator, packager of Tkabber-Pack and Tkabber-Starpack.

Involved in ejabberd since Oct 2004: Drupal administrator, tutorial writer, code contributions, and technical assistance on ejabberd's web forum, mailing list and chatroom.

I was once contracted to develop a small module, later published as GPL [3]. My contractor was so happy with the service that payed me a small plus over the initially negotiated amount.

Why do I want to work on this particular project?

I've written several modules or patches for ejabberd since 2004. However, all of them focus on making the administration tasks easier and are not XMPP-related at all. Examples are: logging (messages, chatrooms to HTML and XML), new means of administration (XML-RPC, command line), automated password recovery, statistics gathering...

Until now I've avoided protocol implementation. With this proposal I'll get valuable knowledge and experience on this subject. So, after this project I will be able to contribute not only on interface tasks, but also protocol ones both as ejabberd core code and external components that are usable by any Jabber server.

I don't plan to get employment, class-taking, or any other task that may conflict with this project.

Education:

I obtained BEng on Technical Engineering in System Data Processing at the School of Computer Science at the Polytechnic University of Valencia (Sep 1997 - July 2000). Later followed my studies and obtained the M. Sc. on Computer Science Engineering at the Faculty of Computer Science at the same university (Sep 2000 - July 2003).

Currently I'm half-way on my Ph. D. thesis work on Computer Science on the Department of Systems Data Processing and Computers at the same university. My main research interest is web prefetching techniques. I've developed a complete environment to test and benchmark such techniques in real-world circumstances. This environment is written in Erlang, is highly parametrizable, provides many types of performance statistics and can be used on real scenarios.

I've published several papers related to web architecture and web prefetching.

[1] http://ejabberd.jabber.ru/mod_muc_log
[2] http://ejabberd.jabber.ru/mod_ctlextra
[3] http://ejabberd.jabber.ru/mod_muc_log_xml

Friday, 13 April 2007

Proposal accepted at GSoC 07!

So, my proposal was accepted at Google Summer of Code 2007.

During this summer I will implement Extended Stanza Adressing on ejabberd. Since I don't have experience on protocol implementation, I will first experiment implementing/updating smaller protocols, like Delayed Delivery and Contact Addresses for XMPP Services. Finally, if I have enought time at the end of summer, I'll update the current implementation of Service Administration.

With all this work I will not only add some features to a widely used Jabber server, but also get more knowledge of XMPP internals, I'll get more involved on the XSF, and will experiment on Erlang and ejabberd coding.

Extended Stanza Addressing does not only reduce bandwidth usage on the local Jabber server, but also on the destination servers. So adding support for this to ejabberd will benefit the wide federated XMPP network. Or something like that... ;)

The posts related to this project will have gsoc tag, and this is the gsoc RSS feed.