Find Substrings In A Dynamic Collection Of String
This question is a little complicated, so I try to describe it through an example. First, we get a string foo, and put it into collection S. Then we get a string sample, and put it into S too. Next, we get a string oo, obviously oo is a substring of foo, so now collection S contains three members: foo, sample, oo. And foo and oo is in the same group. The next string in S is food, which is in the same group as foo and oo. And so on. Finally we get a large collection in which members are all grouped. I want to use this algorithm or these algorithms to process duplicate files, but there are some obvious roadblocks: dynamic collection unicode no fixed pattern Any suggestions?

This question is a little complicated, so I try to describe it through an example.
First, we get a string foo
, and put it into collection S
.
Then we get a string sample
, and put it into S
too.
Next, we get a string oo
, obviously oo
is a substring of foo
, so now collection S
contains three members: foo
, sample
, oo
. And foo
and oo
is in the same group.
The next string in S
is food
, which is in the same group as foo
and oo
.
And so on.
Finally we get a large collection in which members are all grouped.
I want to use this algorithm or these algorithms to process duplicate files, but there are some obvious roadblocks:
- dynamic collection
- unicode
- no fixed pattern
Any suggestions?