Checking Data Quality with GritBot
資料探勘（Data mining）是指從組織的資料庫中萃取有價值資料的技術。資料探勘的搜索模式對於分析資料的影響重大，如果分析的資料中包含錯誤的數值，其分析的資料將會是“垃圾進，垃圾出”（Garbage in, garbage out.）。
GritBot是 RuleQuest Research 公司開發的異常檢測工具，其作為自動的異常數據資料發現工具，其試圖在分析前找到資料中的異常數據。它可以被認為是一個獨立的資料品質審核員，尋找資料庫中異常的離散值或連續屬性的數值。使用GritBot可以提高See5/C5.0和Cubist等演算法從資料集構造模型的有效性。
- GritBot has been designed to analyze substantial databases containing tens or hundreds of thousands of records and many numeric or nominal fields.
- Possibly anomalous values that GritBot identifies are reported, together with an explanation of why each value seems surprising.
- The patterns found by GritBot can be saved and used to check new data. Potential anomalies found in new data can differ from the types of anomalies originally identified.
- GritBot is virtually automatic -- the user does not require a knowledge of Statistics or Data Analysis.
- GritBot is available for Windows 7/8/10 and Linux.
32-bit or 64-bit
32-bit or 64-bit
We offer licensing to our previous customers for either single computers or LANs:
The software can be used on a single computer (including computers with multiple CPUs). The number of concurrent users is not restricted.
Network Licences: (See5 and Cubist, Windows 7/8/10 only)
The software is installed on a single Windows PC (the "server"). After running a small registration application, any Windows PC in the server's network neighborhood can run the software so long as it remains connected to the server via the LAN.
The number of client PCs is not restricted, but the number of concurrent users is restricted. Licences for 2, 5, or 10 concurrent users represent a cost-effective alternative to single-computer licences for applications teams and research groups.
See5 / C5.0
This state-of-the-art system constructs classifiers in the form of decision trees and rulesets. See5/C5.0 has been designed to analyze large volumes of data and incorporates innovations such as boosting.
Cubist produces rule-based models for numerical prediction. Each rule specifies the conditions under which an associated multivariate linear sub-model should be used. Cubist models often yield more accurate predictions than simple linear models without giving up the advantages of interpretability.