Update README_EN.md
This commit is contained in:
parent
5ff2f955b8
commit
8de90d35f1
@ -47,6 +47,4 @@
|
|||||||
* dataset `scientist` from this repo
|
* dataset `scientist` from this repo
|
||||||
|
|
||||||
**Dataset Deduplication**:
|
**Dataset Deduplication**:
|
||||||
Combine absolute matching with fuzzy matching (Simhash) algorithms to deduplicate the dataset, thereby enhancing the effectiveness of the fine-tuning model. While ensuring the high quality of the dataset, the risk of losing important data due to incorrect matches can be reduced via adjusting the threshold.
|
Combine absolute matching with fuzzy matching (Simhash) algorithms to deduplicate the dataset, thereby enhancing the effectiveness of the fine-tuning model. While ensuring the high quality of the dataset, the risk of losing important data due to incorrect matches can be reduced by adjusting the threshold.
|
||||||
|
|
||||||
https://algonotes.readthedocs.io/en/latest/Simhash.html
|
|
||||||
|
Loading…
Reference in New Issue
Block a user