As there lately have been several pres with the wrong number of reference frames
  we thought we'd explain the reasoning behind the rule set.

  On various x264 pages on the internet the following can be read about reference frames:
  "Selects the maximum number of reference frames that can be used. Referenced frames
  are frames that refer to other frames (eg. if both frames are similar). Having a high 
  referenced frame will improve quality but slow up encoding. For typical content, 
  a reference frame of 3 to 5 is recommended. For content with a lot of repetition 
  (eg. animation), a reference frame of 8 to 10 can be used."
  So, higher reference frames means higher quality, why do we then enforce max 4 
  reference frames on high resolution video? It has to do with hardware players. 
  The popcorn hour, twix, wd etc. all support Level 4.1 (L4.1) of the ITU-T h264 
  specification. All graphic cards that have DXVA also support L4.1. So let's see 
  what L4.1 says about reference frames.
  In table 'A-1 Level limits' on page 283 (pdf 305) of the ITU-T specification it 
  says that MaxDPB for L4.1 is 12288 KiB. MaxDPB is the Maximum Decoded Picture Buffer, 
  which is the largest size allowed of the decoded picture buffer when decoding a video.
  By supporting L4.1, the hardware players must have at least 12 MiB of buffer for 
  storing the decoded pictures. This means that a video that requires a buffer of 
  13 MiB is not guaranteed to work on one of these players.
  As 16 * 16 pixels macroblocks are used, all resolutions needs to be mod16, for ease of
  reading, the maths to make them mod16 is not included below.
  The DPB in KiB is calculated as follows:
  DPB = vertical resolution * horizontal resolution * 1.5 * reference frames / 1024
  If we transform this formula to get the reference frames instead we get:
  ref = 12288 * 1024 / (vertical resolution * horizontal resolution * 1.5)
  We of course can't use partial frames for referencing and thus the reference frames 
  should be rounded down to the closest integer. We can also transform this to get the 
  maximum vertical resolution for a specific reference frames value, here we need to
  round the vertical resolution down to mod16:
  vertical res = 12288 * 1024 / (horizontal resolution * 1.5 * reference frames)
  With the above formula we can conclude that the highest vertical resolution that we 
  can have ref 5 on and still be L4.1 compliant is 864 pixels.
  873.813333 = 12288 * 1024 / ( 1920 * 1.5 * 5 )
  864 = floor( 873.813333 / 16 ) * 16
  The 1.5 in the calculations above is the YV12 colourspace, it needs 12 bits to store 
  1 pixel. In other words, 1.5 bytes per pixel.
  So, to conclude this, the reason we put ref 4 as max for movies with vertical resolution
  greater than 864 in rules is not because we want to be able to encode releases faster. 
  It's because we want releases to be L4.1 compliant and thus possible to play on the 
  popcorn hour, twix and other hardware players. And we require at least ref 5 on all 
  videos where it's possible while still respecting L4.1, this to ensure high quality.
  ITU-T specification:
  12 bits per pixel for YV12: