2024-01-29 report on current draft klk-sv-v2 text elements

The corpus consists of those text elements (corresponding to scanned
pages) if the national library data (including v1 material and new
material) where *at least one sentence was identified as Swedish* by
HeLI-OTS and a specially written post-processor to fix the
identificationsa in their context.

The report contains the summary count of the identified languages of
sentences in each page:
- the non-standard code "xxx" stands for no identified language and is
  probably best ignored even if the count is high (OCR artefacts cause
  segmentation artefacts)
- the majority of a page may well be e.g. "fin" rather than "swe"
- each report file (data/YEAR.tsv) covers a year of publication
- each report line pertains to a text element (a scanned page)
- except the head line, that is
- the report lines are also counted into the file wc-l.txt

The publication titles are taken as-is from the current form of the
corpus. The one text element nominally in 2092 may not be the only one
with an incorrect date. Some attempts at improvement may happen before
the corpus is ready for publication in the language bank.
