Saturday, 24 November 2007

GSoC Aknowledgments

I would like to thank several people for their help in my Google Summer of Code project:

I also want to express my general gratitude to some particular organizations and projects:

GSoC Commemorative Photo

The students that successfully completed their Google Summer of Code 2007 project do not only receive the satisfaction & experience of collaborating in Free and Open Source Software, 4.500 US Dollars and two lines to add to our CV. We also get several gifts: a book (Karl Fogel's Producing Open Source Software), a certificate, a tshirt and two stickers.

Now that I received the tshirt and certificate, it's time to take a commemorative photo.


Some arbitrary comments:

  • With the first payment I received from GSoC at the beginning of the summer I bought Joe Armstrong's new book: Programming Erlang, Software for a Concurrent World. I read it while I was developing my project, and had the chance to apply several lessons. I also read many parts of the open source book. And even tried to learn to use Emacs (and succedded only partially...).
  • The papers next to the monitor are 30% design diagrams, development notes and todo lists for the project. The other 70% papers are XEPs that I had to print so I could study them carefully.
  • 4.500 USD converts to just 3.200 euros. With the cost of living here nowadays, this amount barely lasts for the time that it costed to earn. By contrast, the two lines in the CV and the earned experience look more profitable in the long term. Until then, I guess I can sell the stickers in Ebay or something.
  • Some weeks ago I learnt the hard way that my watch, a 6 years old Casio Databank, was not water resistant. The watch I wear now is a gift I received when I was 14, and I had never used until now.

Friday, 23 November 2007

GSoC Status Update: November'07

It has been three months since I posted the Final GSoC project status. It's time to report what happened with the remaining tasks.

The tasks that I've completed since then are:

  • Perform code profiling to find bottlenecks and deficiencies in mod_multicast. Improve the code.
  • Once I make all the possible optimizations: perform benchmarks to check mod_multicast's effect in CPU, RAM and traffic consumption.
The tasks that I haven't completed yet are:
  • Wait for ejabberd code reviewers, in case I need to fix any problem in my XEP-0033 patches for ejabberd before they are applied to ejabberd trunk.
  • Discuss potential security and spam vulnerabilities (talk in JDEV and JADMIN mailint lists).
  • Add XEP33 support to ejabberd's Pub/Sub and/or PEP service once their codebase is stable.
  • Wait for Peter Saint-Andre's questions regarding his XEP-0033 update.
So this adventure has not ended yet.

Thursday, 4 October 2007

Travel to Costa Rica

I have a travel to San José, Costa Rica for a whole week. The expected return is on Sunday, 15th of October.

Tuesday, 2 October 2007

Results of mod_multicast execution time with timer:tc

I previously presented timer:tc and multicast_test. Now I'll illustrate some results I obtained with them.

I ran 6 experiments, using normal routing, trusted multicast and untrusted multicast; both single and multiple servers. Each experiment was ran for several number of destination addresses.

The time is expressed in microseconds (1 second = 1.000.000 microseconds). Note that the time shown here includes not only mod_multicast, but also packet building and ejabberd_router.


Results

Normal - Single

           #     Exec Time   Time per destination
Destinations (microseconds) (microseconds)
1 161 161.00
101 6074 60.14
201 22163 110.26
301 40950 136.05
401 74586 186.00
501 100445 200.49
601 107012 178.06
701 131163 187.11
801 146473 182.86
901 153147 169.97

Normal - Multiple
           #     Exec Time   Time per destination
Destinations (microseconds) (microseconds)
1 196 196.00
101 11206 110.95
201 20918 104.07
301 30811 102.36
401 43627 108.80
501 52742 105.27
601 59650 99.25
701 75291 107.41
801 81077 101.22
901 80192 89.00

Trusted - Single
           #     Exec Time   Time per destination
Destinations (microseconds) (microseconds)
1 226 226.00
101 8719 86.33
201 39938 198.70
301 77040 255.95
401 101675 253.55
501 122156 243.82
601 144422 240.30
701 158917 226.70
801 186553 232.90
901 197387 219.08

Trusted - Multiple
           #     Exec Time   Time per destination
Destinations (microseconds) (microseconds)
1 283 283.00
101 21751 215.36
201 41077 204.36
301 66047 219.43
401 102542 255.72
501 315313 629.37
601 140158 233.21
701 164836 235.14
801 171249 213.79
901 189712 210.56

Untrusted - Single
            #     Exec Time   Time per destination
Destinations (microseconds) (microseconds)
1 317 317.00
101 25528 252.75
201 79402 395.03
301 155075 515.20
401 284532 709.56
501 329484 657.65
601 357879 595.47
701 388755 554.57
801 414622 517.63
901 506557 562.22

Untrusted - Multiple
            #     Exec Time   Time per destination
Destinations (microseconds) (microseconds)
1 1121 1121.00
101 115736 1157.36
201 331379 1656.89
301 601325 2004.42
400 2478072 6195.18
...


Analysis

What does this show? The computation time per destination address remains constant across different packet sizes. The fluctuations are only due to my computer: it was being used for other tasks, not only running those experiments [1].

For example, processing a packet with 300 destination addresses (few rosters and chatrooms have more than this number of items) costs around 77 milliseconds if the source is trusted (a local MUC service or session manager). The same packet costs 155 milliseconds if the source is not trusted and the packet must be carefully inspected.

In the less bright side of the results, the Multiple set of experiments again perform quite badly compared to Single. Note that part of this is due to the function add_addresses, as explained in the Fprof article. Another part of the problem is not in the server, but in the stressing tool: the building function I implemented in multicast_test puts non-existent servernames when it's run with the 'Multiple' option: "5.localhost", "7.localhost"...


Conclusions

As a summary, the results I obtained using timer:tc are compatible with the previous results I obtained with Fprof and Jabsimul. All them indicate that mod_multicast consumes approximately, in average, as much time as the ejabberd routing functions do.

So, using multicast increases the CPU consumption, as was expected. This cost will be acceptable once the benefits of multicasting are taken into account.

---
[1] Statistically speaking, it would be preferable to run a batch of experiments and show only the average and confidence interval, but I guess this is not really required for now.

Results of code profiling mod_multicast with Fprof

I previously presented Fprof and multicast_test. Now I'll illustrate some results I obtained with them.

I tried using normal routing, trusted multicast and untrusted multicast. Both single and multiple servers. And a fixed number of 300 destination addresses.

All those experiments generate a lot of information, so I summarize here only the important results. Times are in milliseconds, and measure the full processing time in the system where the experiments were ran.


Results

Normal - Single

Total: 320
build_packet: 30
ejabberd_router: 275

Normal - Multiple

Total: 630
build_packet: 30
ejabberd_router: 593

Trusted - Single

Total: 410
build_packet: 30
ejabberd_router: 266
mod_multicast: 114
string_to_jid already requires: 52

Trusted - Multiple

Total: 2020
build_packet: 30
ejabberd_router: 700
mod_multicast: 1290
add_addresses consumes too much here: 1067

Untrusted - Single

Total: 500
build_packet: 30
ejabberd_router: 270
mod_multicast: 200
string_to_jid requires: 50

Untrusted - Multiple

Total: 2060
build_packet: 30
ejabberd_router: 640
mod_multicast: 1390
add_addresses consumes too much here: 1070


Analysis

In those examples, using mod_multicast does not reduce the time consumed by ejabberd_router because the multicast packet is mean to be sent to local users, so the routing process is called 300 times always. If the destinations were not local, and some of them were on the same servers, ejabberd_router would be called less frequently and so the usage of mod_multicast would be noticeable in that aspect too.

The Trusted multicast requires only half of the time required by ejabberd_router itself. In the case of Untrusted, the additional checks make mod_multicast as costly as ejabberd_router.

In the case of Trusted multicast, the function in mod_multicast that consumes the more processing time is jlib:string_to_jid. In Untrusted, the most problematic function is add_addresses. Maybe the code of that function can be improved.

Summarizing, I consider that mod_multicast code is fairly efficient when compared to other parts of ejabberd, specially ejabberd_router.

multicast_test: analyze mod_multicast performance

To measure mod_multicast computation consumption I developed a small Erlang module: multicast_test.

This module includes functions to create a message packet with an XEP33 'addresses' element. The number of destinations is configurable. And the server of each destination can be 'single', so all destinations are in the same server, or 'multiple', so each destination is from a different server. This packet can be sent to route_trusted or route_unstrusted. It is also possible to send individual packets to ejabberd_router.

Code profiling

It is possible to run those functions with Fprof to profile the time consumed by each function in mod_multicast:

fprof:apply(multicast_test, ROUTING, [SERVERS, NUM_DESTS]).
fprof:profile().
fprof:analyse([{dest, []}]).
Where:
  • ROUTING: testn for normal routing, testt for trusted sender, and testu for untrusted sender.
  • SERVERS: single for just a single server, multiple for a different server for each destination address.
  • NUM_DESTS: number of destination addresses.
For example, execute this in the Erlang shell of the ejabberd node:
fprof:apply(multicast_test, testu, [single, 300]).
fprof:profile().
fprof:analyse([{dest, []}]).
And you will get a file fprof.analysis with very detailed information.

Execution time

It's also possible to measure the execution time of those functions with a varying number of destinations. This will show if the performance of mod_multicast is dependant on the number of destinations or the number of destination servers...

The functions are:
multicast_test:ROUTING(SERVERS, INI, END, INC).
Where:
  • ROUTING: normal, trusted, untrusted.
  • SERVERS: single or multiple.
  • INI, END and INC: The initial number of destinations, the increment and the ending value.
For example:
multicast_test:untrusted(single, 1, 1000, 100).
I will later post some results I obtained using those functions.