Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
rasbt
GitHub Repository: rasbt/machine-learning-book
Path: blob/main/ch02/iris.names.txt
1247 views
1
1. Title: Iris Plants Database
2
Updated Sept 21 by C.Blake - Added discrepency information
3
4
2. Sources:
5
(a) Creator: R.A. Fisher
6
(b) Donor: Michael Marshall (MARSHALL%[email protected])
7
(c) Date: July, 1988
8
9
3. Past Usage:
10
- Publications: too many to mention!!! Here are a few.
11
1. Fisher,R.A. "The use of multiple measurements in taxonomic problems"
12
Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions
13
to Mathematical Statistics" (John Wiley, NY, 1950).
14
2. Duda,R.O., & Hart,P.E. (1973) Pattern Classification and Scene Analysis.
15
(Q327.D83) John Wiley & Sons. ISBN 0-471-22361-1. See page 218.
16
3. Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New System
17
Structure and Classification Rule for Recognition in Partially Exposed
18
Environments". IEEE Transactions on Pattern Analysis and Machine
19
Intelligence, Vol. PAMI-2, No. 1, 67-71.
20
-- Results:
21
-- very low misclassification rates (0% for the setosa class)
22
4. Gates, G.W. (1972) "The Reduced Nearest Neighbor Rule". IEEE
23
Transactions on Information Theory, May 1972, 431-433.
24
-- Results:
25
-- very low misclassification rates again
26
5. See also: 1988 MLC Proceedings, 54-64. Cheeseman et al's AUTOCLASS II
27
conceptual clustering system finds 3 classes in the data.
28
29
4. Relevant Information:
30
--- This is perhaps the best known database to be found in the pattern
31
recognition literature. Fisher's paper is a classic in the field
32
and is referenced frequently to this day. (See Duda & Hart, for
33
example.) The data set contains 3 classes of 50 instances each,
34
where each class refers to a type of iris plant. One class is
35
linearly separable from the other 2; the latter are NOT linearly
36
separable from each other.
37
--- Predicted attribute: class of iris plant.
38
--- This is an exceedingly simple domain.
39
--- This data differs from the data presented in Fishers article
40
(identified by Steve Chadwick, [email protected] )
41
The 35th sample should be: 4.9,3.1,1.5,0.2,"Iris-setosa"
42
where the error is in the fourth feature.
43
The 38th sample: 4.9,3.6,1.4,0.1,"Iris-setosa"
44
where the errors are in the second and third features.
45
46
5. Number of Instances: 150 (50 in each of three classes)
47
48
6. Number of Attributes: 4 numeric, predictive attributes and the class
49
50
7. Attribute Information:
51
1. sepal length in cm
52
2. sepal width in cm
53
3. petal length in cm
54
4. petal width in cm
55
5. class:
56
-- Iris Setosa
57
-- Iris Versicolour
58
-- Iris Virginica
59
60
8. Missing Attribute Values: None
61
62
Summary Statistics:
63
Min Max Mean SD Class Correlation
64
sepal length: 4.3 7.9 5.84 0.83 0.7826
65
sepal width: 2.0 4.4 3.05 0.43 -0.4194
66
petal length: 1.0 6.9 3.76 1.76 0.9490 (high!)
67
petal width: 0.1 2.5 1.20 0.76 0.9565 (high!)
68
69
9. Class Distribution: 33.3% for each of 3 classes.
70
71