· Censuses and Voter lists · Researching your family history

The limitations of OCR

In working in the Australia Electoral rolls collection on Ancestry, I have come across numerous examples of instances when the OCR has not done a very good job.  Some names come out as gobblygook, while other names are missed completely.

The following is a list of the years available on Ancestry for Victoria.  Those marked by asterisk have been transcribed. For all other years, the records were extracted using a new OCR indexing method. They were not transcribed.

  • Victoria:1856*, 1903*, 1905-06, 1908, 1909*, 1910, 1912-13, 1914*, 1915-18, 1919*, 1920-22, 1924*, 1925-28, 1931*, 1932-35, 1936-37*, 1942-43*, 1949*, 1954*, 1958*, 1963*, 1967, 1968*, 1972*, 1977*, 1980*

The years 1905 and 1906 were not transcribed.  The following is the entry for my great grandfather Henry Palmer Marr’s sister, Jemima Marr, and her husband Thomas Gardiner Nicholls, from the 1906 Electoral roll.

Thomas and Jemima 1906

The entry for Jemimah is clearly visible.  However, when I go to attach it to her profile, this is the list of names

1906 name list

Her husband, Thomas Gardiner Nicholls, is there, but Jemima’s entry has been skipped.

This is the image from 1908

Nicholls 1908

Again, the image is fairly clear.  This is the list of names:

1908 name list

Ann Nicholls was her daughter, and so was Jemima Marr Nicholls. Thomas Gardiner Nicholls junior was her son.  Only the daughters Ann and Jemima appear in the list of names – as far as I can tell.  The problem is, not only are names missed, but the list of names jumps around.  It started with O’Brien, which was at the bottom of the first side of the image, and then some of the names on the second page of the image, and then jumped back to the first side.  And at the end of the list, it had jumped back to O’Brien.

1908 name list 2

The suggestion is to make corrections, which you can do in the correction panel at the bottom of the page.  The only problem is that there doesn’t seem to be any way to add people who have been completely missed, and what do you do about the fact that the names are out of order.

My way around this problem is to search for all the entries for the person that have been indexed, and then I check my database for all the years I have found for them.

Jemima's checklist

I have another checklist in Excel, where I have listed all the years that are available on Ancestry, and I put the year the person was born, and when they died, as well as which years I have the Electoral rolls for.

Electoral roll checklist

For the years I’m missing, I go back and browse the images.  I choose the year, the division and the subdivision.  This is why I keep the details of the division and subdivision in my database and checklist, so that I know which subdivisions to browse.



2 thoughts on “The limitations of OCR

  1. This is another great look into your process, Lois, and a great commentary on OCR. I am curious about your process using Access and Excel – have you not found an integrated solution? Obviously this setup has been working for you long-term, and I think it speaks volumes about the functionality of genealogy software currently available. What are your thoughts on the topic?


    1. I set up the separate Excel index because I can have the Excel index open, to check which years I still need, and be able to enter the new documents I find into the database at the same time, rather than having to go in and out of the forms in Access all the time. It’s also simpler to set up the list of available years in Excel,

      I think that genealogy software is great for what it does, but for my own research I found I needed a database as well to keep track of all my research, since I’ve been researching my family history for nearly 30 years now.


Comments are closed.