The problem I’ve had with finding duplicates in the past is that it all tended to get out of hand. I’d get 500 or so marked as Not Duplicates and fatigue would set in. By the time I got back to it I’d added more people to my database and forgotten what I was doing.
There was a time, albeit a million years ago, when I could navigate through the generations to find someone in my database faster than I could find them in the Name List. With over 12,000 people in my database now that’s no longer the case.
My main Cuz who I share my paternal lines with works on about 200 people at a time and sends me files to merge. Another cousin recently made a huge contribution to my maternal grandfather’s line. Dozens of those people were or are married to relatives in my maternal grandmother’s line. That’s what comes of never leaving one county for 250+ years.
Version 8 of Legacy is coming with an automatic duplicate checker. As I understand it, that means we’ll be told right away if we enter someone who’s already in our database so we can deal with it up front.
I’m assuming that it’s not going to fix the duplicates that are already there. So, persistence is key.
When I ran Find Duplicates this morning it came up with about 1,500 of them. Most of those are not duplicates but the default search options tend to be overkill.
Since I hadn’t done this for awhile, the nuclear option was to clear the Not Duplicates List from former efforts. If it had been done conscientiously the first time it wouldn’t be necessary to go back over it. I don’t remember how well I was paying attention so I started there.
First I went to the Not Duplicates List and deleted everyone.
It would be nice if there was a Clear All button here but there isn’t.
To remove one entry at a time, I highlighted the first and clicked Remove over and over until the list was gone.
Now there were 5 choices:
- left person is parent of the right person (not a duplicate),
- opposite gender (not a duplicate),
- is a duplicate
- isn’t a duplicate
- I’m not sure
My mind is, frankly, too lazy to consider five options at the same time so I broke it down into two sections.
Finding Duplicates, Part 1
After running the default options (click the Reset button) on the Find Duplicates screen, I ran through the list of 2,502 possible matching pairs clicking Skip to Next over and over, if it didn’t advance automatically. I only paid attention to the two pop-up windows and ignored everything else.
Merge Option 1) left person is a parent of the right person.
I tended to skip through these clicking on the Not Dup, Skip to Next button. I assumed it was probably telling me the truth because if I had screwed up badly enough to confuse parents with their children the problem would have showed up some other way by now. Like in the Potential Problems report and marriage, birth or death dates.
Merge Option 2) The person selected is the opposite gender
I selected Yes to view, took a quick glance at the names and then marked them as Not Duplicates.
If I saw something like Frances and Francis of the same parents I’d skip over those to look at later.
The first pass through took about an hour and I got 2,502 down to 1,222 with 1,441 Not Duplicates.
This was quite a dizzying experience, clicking through 2,502 pairs. Fortunately, Legacy gives the option to save the merge midstream when it’s closed. The next time I clicked on Find Duplicates, the merge would continue from where it left off.
Finding Duplicates, Part 2
When I came to the end of the pairs I went back to Find Duplicates and created a new list. There were only three options left. Either the pair were Duplicates, obviously Not Duplicates or I’m Not Sure.
Merge Option 3) If it was an easy duplicate to merge, I merged it.
If it was one that was going to take some thought, i.e. my version of a person vs. Cuz’s version I tagged it for later.
Merge Option 4) If a pair were obviously not duplicates they were marked as Not Duplicates. Anyone who goes into the Not Duplicates List is gone as far as Merge is concerned.
The Not Duplicates are not going to come up again in any future running of Find Duplicates and that’s a good thing.
Merge Option 5) I’m Not Sure are the ones that I tagged. Those and anything I knew needed merging but didn’t want to do right now.
First, I made sure that Tag Number 1 was not being used for anything else. In the Advanced Tagging box, I untagged everyone using that number just in case.
I like to err on the side of caution and tagged any pair that could be duplicates, except for the bazillion versions of Mary Unknown because I figure, for now, she’s hopeless anyway.
Once I had my list of possible duplicates, I did a search on Tag 1 and brought up the list. As you can see, the people line up next to each other in obvious pairs by name. This makes it much easier to click from one to the other.
I use two bookmarks to mark one pair at a time so I can click back and forth to see what I have in the way of dates and sources to help. If they’re close I’m back in Research Heaven looking for the puzzle piece to lay this to rest.
With 2,367 pairs now tucked away in the Not Duplicates List, I’m left with a working list of 408 people who might be duplicates. If any of them are I use Manual Merge to merge them and then remove the tag. If they’re not, I use Manual Merge to mark them as Not Duplicates and remove the tags. If I’m still not sure I leave them tagged.
I use Tag 1 for a lot of other things so I moved my list to Tag 7. When, in Legacy 8, we’re able to see all nine tags at once I’ll be able to see Tag 7 and be reminded that it’s a potential duplicate. Whatever I can’t figure out today may be tomorrow’s Happy Dance.