Update README_EN.md

This commit is contained in:
MING_X 2024-04-21 17:33:33 +08:00 committed by GitHub
parent 5ff2f955b8
commit 8de90d35f1
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -47,6 +47,4 @@
* dataset `scientist` from this repo * dataset `scientist` from this repo
**Dataset Deduplication** **Dataset Deduplication**
Combine absolute matching with fuzzy matching (Simhash) algorithms to deduplicate the dataset, thereby enhancing the effectiveness of the fine-tuning model. While ensuring the high quality of the dataset, the risk of losing important data due to incorrect matches can be reduced via adjusting the threshold. Combine absolute matching with fuzzy matching (Simhash) algorithms to deduplicate the dataset, thereby enhancing the effectiveness of the fine-tuning model. While ensuring the high quality of the dataset, the risk of losing important data due to incorrect matches can be reduced by adjusting the threshold.
https://algonotes.readthedocs.io/en/latest/Simhash.html