Implement the k-mer model
This is a simplification originally proposed by Oliver Mailhot.
We represent the transcriptome by its k-mer content and use the score table to perform hybridization of those k-mers against microRNA seeds. This model disregards local context like supplementary bindings, sites overlap, etc.
For each transcript, generate targets from its k-mers:
- if the k-mer is novel, set its abundance to the transcript expression
- if the k-mer is found previously, increase its quantity by the transcript expression
- add a back-reference from the k-mer to the original transcript location
The back-reference maps k-mers to their occurrences in the transcriptome. It could simply be a MirbookingTarget -> List<MirbookingTargetSite>
hash table.
When generating the output, use the back-reference to assign occupants to the original sequences proportionally to their contribution to the k-mer abundance. For that, we need to divide the transcript abundance by the k-mer abundance and multiply by the occupant quantity.
It's possible that the solution violates conservation of mass since the model is not aware of the neighbourhood of a k-mer which is specific to each occurrence.
Implement this with a separate binary since this will require some extra logic for the back-reference. We want to move as much of the utilities back into the library to reuse as much code as possible.