We all know that rails has a rocky history regarding threads. Sadly, that seems also include the Rails port of one of my favourite Merb features: run_later.

Basically run_later takes a block, turns it into a Proc, and sends it to a worker threads for later execution. That way the request processing is not hindered in any way. Say, you want to send a “you have just signed-up” email to your users: this is a perfect solution: easy to use, lightweight (you don’t need extra middle ware), and semi reliable (no fallback when your system breaks down).

However, I ran into few problems using mattmatt’s solution in, at least, development mode:

- the app regularly ran into class (un)loading issues, and
- Rails’ mysql adapter apparently didn’t disposed of used database connections, refusing new connections

While all of the above can be explained as some of the quirks of the development environment it didn’t increase the “trust level” into that solution (and I have to point out here, that markmark’s code looks pretty good to me, and that these problems more likely arise from the somewhat idiosyncratic behaviour of Rails towards threads)

That made me thinking: why use threads in the first place? After all, what we need is a defined point during request processing that gives us a handle to yield of some piece of code, and which occurs after the request’s response has been sent back to the client. And, yes, thanks to metal, I found one.

So here is a solution. It is not as feature complete as markmark’s, tests are still missing, and it blocks the server process until the run_later block is finished. (But you have more than one server process running, haven’t you?)

Someone out there wants to help move that into a regular plugin?


If you are running accross this error

[BUG] cross-thread violation of rb_gc()
ruby 1.8.6 (2008-08-11) [universal-darwin9.0]


- say you are are installing the latest so-called “stable” typo version on your OSX machine – then you might see the above error. Here is the solution: *remove the bundled json gem!*: It is version 1.1.3, which is way old, and apparently it contains some binaries that were compiled using ruby 1.8.6. Just install the “json” gem locally.


I guess you are using rcov too to check that all your code was active in your tests at least once. Well, then, here are some bad news:

  • rcov code coverage only checks whether or not a line was executed, meaning any code from that source line. And this usually trips on
    do_something if some_condition?
    

    because what happens if some_condition? always fails in your tests? (say, “Rails.env.production?”). A somewhat chatty way to detect such cases is to write it down on multiple lines:

    if some_condition?
      do_something
    end
    
  • Due to the heavily dynamic structure of a Ruby application nearly nothing is guaranteed to yield the same results when calling at a different time or in a different context, even code as simple as
    %w(a b c).sort
    

    may yield unexpected – for the naive reader – results. So even if you have 100% code coverage, the case might be that the entire test framework which runs the tests for you behaves differently depending if called from rcov, and then your code might behave different within the framework and without it.

No, I don’t strive for a 100% test coverage. I employ TDD any now and then: there are cases where that works just perfectly. I employ a big enough test suite to give me confidence in my app. And I employ black box tests – the real world is where the real app will be running. So testing against that is what gives me confidence in my application.


In the ever ongoing fight against spam there is one really wonderful weapon: greylisting. For those that don’t know how it works: whenever an email server sends an email for the first time, the receiving mail server rejects the email with a temporary error. The idea being that a legimitate server resends the email after a certain period of time, and then the email gets through, but a spam sender is likely not to resend the mail again.

Which works like a charm – I have a spam ratio of less than 1% since I enabled grey listing, and that without any spam classification on my servers – unless… you are registering at a new web service. In which case the confirmation email doesn’t get through the first time, me having to wait for some indeterminate time. On a countless number of services I just decided that the wait was not worth my time…

If you want to give me and other grey listers a smooth user experience consider the following:

  • reduce the grace period from one hour (which is quite likely the default value in your Linux distribution too) to a value around 5 minutes, and/or
  • modify your application that a user, which is registered, may use the service for a certain time without confirming the confirmation email.

I understand that the second option is some work and might not always possible; but the first option is just a simple configuration setting and should take your system admin no longer than 5 minutes.


Just found out that mongrel_rails restart does NOT work, when the original mongrel was started with the -c option. Yuk!

Well, this is not entirely true. It works if the -c parameter contains an absolute path, or refers “to itself”:
-c ../current works like a charm, and in fact updates the current directory to whereever current points currently.

However, I could not get mongrel_rails restart –soft working….


Well, that is of course total bullshit. With that title I will conduct a small experiment.

I found that the least informative of my blog posts attract the most readers. So if this stays true with that post (and a title like that should attract at least some attention) I will add more noise here. Promise!


It is not chrismas, hence no quiz, but this was strangely surprising:

class A
  @@t = "A::@@t"
  T="A::T"

  def self.s1; @@t; end
  def self.s2; T; end
end

class B < A
  def self.s1; @@t; end
  def self.s2; T; end
end

class C < A
  @@t = "C::@@t"
  T="C::T"
  def self.s1; @@t; end
  def self.s2; T; end
end

[ A.s1, A.s2, B.s1, B.s2, C.s1, C.s2 ]

gives you

["C::@@t", "A::T", "C::@@t", "A::T", "C::@@t", "C::T"]


class Array
  def to_proc
    proc do |obj|
      self.map { |sym| obj.send(sym) }
    end
  end
end

gives you


Account.all.map(&%w(id email))

This benchmark compares thinking_sphinx with acts_as_xapian. We need a search engine that gives us the IDs of matching documents from a fulltext index, basic text search only.

Data

  • one table with 200k entries with 5k of text (avg) in one column
  • one table with 500k entries with 7k of text (avg) in 6 columns
  • one table with 500k entries with 7k of text (avg) in 4 columns

Indexing

Initial indexing took 10 mins with thinking_sphins and 75 mins(!!) on acts_as_xapian

Search performance

The search performance on queries that return only a few items is nearly identical.

The search performance on queries that return many items (~10000) is nearly
identical, 90% of the time is spend in ActiveRecord.

In our case – we only need IDs and not the entire documents – sphinx runs
at 0.6 secs for a particular query (with 10000 results),
where acts_as_xapian needs 4.5 secs. This is because thinking_sphinx allows
you to only fetch the ids, where acts_as_xapian insists of pulling the
models from the database. When patching acts_as_xapian to allow for pulling
ids only, we land at 0.6 vs 0.4 secs.

Results

We will choose sphinx because

  • it is similarily fast to xapian
  • runs over the network by default
  • Indexing is way faster (I guess because acts_as_xapian pulls all data to be index from the database to hand it over, while sphinx can do that itself)
  • acts_as_xapian would need to be patched for performance reasons.

And here is some food for our beloved web spiders


While some of my 5 readers on average (per day) might already know, for everyone else: CouchDB will be part of the next Ubuntu release. And that one comes with long time support.

Congratulations, Couchies.