Research talk:Measuring edit productivity/Work log/2015-09-18

Friday, September 18, 2015 edit

OK! So I'm running a job on stat1003 and I've learned about two issues.

  1. is that the output queue used in para to parallelize the processing work needs a fixed size or memory is going to become a huge issue. When I run the job on a single file (no output queue), memory usage is minimal.
  2. this problem implies that the mappers can produce output far faster than the bzip2 stream can write. That means we need to multiprocess the compression of bzip2. I filed a feature request to add that. I'll be digging into that primarily today.

--Halfak (WMF) (talk) 14:31, 18 September 2015 (UTC)Reply

Return to "Measuring edit productivity/Work log/2015-09-18" page.