Update README_EN.md

This commit is contained in:
MING_X 2024-04-21 17:33:33 +08:00 committed by GitHub
parent 5ff2f955b8
commit 8de90d35f1
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -47,6 +47,4 @@
* dataset `scientist` from this repo
**Dataset Deduplication**
Combine absolute matching with fuzzy matching (Simhash) algorithms to deduplicate the dataset, thereby enhancing the effectiveness of the fine-tuning model. While ensuring the high quality of the dataset, the risk of losing important data due to incorrect matches can be reduced via adjusting the threshold.
https://algonotes.readthedocs.io/en/latest/Simhash.html
Combine absolute matching with fuzzy matching (Simhash) algorithms to deduplicate the dataset, thereby enhancing the effectiveness of the fine-tuning model. While ensuring the high quality of the dataset, the risk of losing important data due to incorrect matches can be reduced by adjusting the threshold.