Data Mining with Cubist
Cubist 資料採礦軟體工具
資料探勘(Data mining)是指從組織的資料庫中萃取資料的過程,這些資料通常被用來洞察該組織的營運模式和預設未發生之結果,以支援使用者做決策。
Cubist是 RuleQuest Research 公司開發的建立預測模型的工具。其內建的規則算法可幫助建立預測模型的輸出值,並與See5/C5.0產品互補。例如,See5/C5.0可能依據其百分比將數據分類為“高”、“中等”或“低”,而Cubist將會是輸出一個數字,如“7.3”。
Cubist是一個功能強大的工具,Cubist模型比那些一般的技術,如多元線性回歸得可以到更好的結果,同時也比神經網絡分析更容易理解。
- Cubist has been designed to analyze substantial databases containing hundreds of thousands to millions of records and tens to thousands of numeric or nominal fields. If you have used neural networks or similar modeling tools, you'll be surprised by Cubist's speed! (Cubist also takes advantage of processors with up to eight cores in one or more CPUs (including Intel Hyper-Threading) to speed up model-building.)
- To maximize interpretability, Cubist models are expressed as collections of rules, where each rule has an associated multivariate linear model. Whenever a situation matches a rule's conditions, the associated model is used to calculate the predicted value.
- Cubist is available for Windows 7/8/10 and Linux.
- Cubist is easy to use and does not presume advanced knowledge of Statistics or Machine Learning (although these don't hurt, either!)
- RuleQuest provides C source code so that models constructed by Cubist can be embedded in your organization's own systems.
Platforms
Operating System
|
Windows 7/8/10
|
Linux
|
Hardware Platform
|
PC
|
PC
|
Executable
|
32-bit or 64-bit
|
32-bit or 64-bit
|
Licensing
We offer licensing to our previous customers for either single computers or LANs:
Single-Computer Licences:
The software can be used on a single computer (including computers with multiple CPUs). The number of concurrent users is not restricted.
Network Licences: (See5 and Cubist, Windows 7/8/10 only)
The software is installed on a single Windows PC (the "server"). After running a small registration application, any Windows PC in the server's network neighborhood can run the software so long as it remains connected to the server via the LAN.
The number of client PCs is not restricted, but the number of concurrent users is restricted. Licences for 2, 5, or 10 concurrent users represent a cost-effective alternative to single-computer licences for applications teams and research groups.
See5 / C5.0 This state-of-the-art system constructs classifiers in the form of decision trees and rulesets. See5/C5.0 has been designed to analyze large volumes of data and incorporates innovations such as boosting.
|
|
GritBot is a sophisticated data cleansing tool that helps you to audit and maintain data quality. Working from the raw data alone, GritBot automatically explores partitions of the data that share common properties and reports surprising values in each partition. GritBot uncovers anomalies that might compromise the effectiveness of your data mining tools.
|