I derived the following graph from the log file of the parsing run:
Another quick thing to report is that we've started at least 1 fold of the 10-fold cross validation experiment (more detail on that later). As a quick summary, we split the Urdu--English training data into 10 folds, then trained a Hiero grammar on 9 of the folds and parsed the held-out data. Separately, we're also parsing the training data that we derived the grammar from.
I plan to write another post soon describing the error analysis we hope to perform on this parsing data. In the meantime, here are some raw numbers:
Training on folds 1-9, parsing fold 0: 5928 sentences, 1095 parsed successfully.
Training on folds 1-9, parsing folds 1-9: processed 10919 sentences so far, 4764 parsed successfully.