Find Substrings In A Dynamic Collection Of String

This question is a little complicated, so I try to describe it through an example. First, we get a string foo, and put it into collection S. Then we get a string sample, and put it into S too. Next, we get a string oo, obviously oo is a substring of foo, so now collection S contains three members: foo, sample, oo. And foo and oo is in the same group. The next string in S is food, which is in the same group as foo and oo. And so on. Finally we get a large collection in which members are all grouped. I want to use this algorithm or these algorithms to process duplicate files, but there are some obvious roadblocks: dynamic collection unicode no fixed pattern Any suggestions?

Feb 8, 2025 - 05:43
 0
Find Substrings In A Dynamic Collection Of String

This question is a little complicated, so I try to describe it through an example.

First, we get a string foo, and put it into collection S.

Then we get a string sample, and put it into S too.

Next, we get a string oo, obviously oo is a substring of foo, so now collection S contains three members: foo, sample, oo. And foo and oo is in the same group.

The next string in S is food, which is in the same group as foo and oo.

And so on.

Finally we get a large collection in which members are all grouped.

I want to use this algorithm or these algorithms to process duplicate files, but there are some obvious roadblocks:

  • dynamic collection
  • unicode
  • no fixed pattern

Any suggestions?