This document will step you through the process of reading in and formatting weather data downloaded from NOAA.
I chose to download the “Local Climatological Data” recorded at the Klamath Falls International Airport from 1959 to 2017. You can download weather data from other stations here. Here is the documentation of the LCD. Besides selecting a station to download from, you’ll need specify the dates for which you want the data. I could only download 10 years worth of data at a time and so had to combine the several downloads into one dataset, called “KFweather.csv”. I did this combination in R by reading in each .csv file then using the function “rbind” to combine them and “write.table” to write the result into “KFweather.csv”. You can either download your own data from NOAA or use mine.
After downloading the data to your computer, try opening it in R with the “read.table” function.
mypath<-"C:\\Users\\rosanna.overholser\\OneDrive - Oregon Institute of Technology\\Courses\\math407\\data\\weather\\"
myfile<-paste(mypath, "KFweather.csv", sep='')
# note that if I didn't use "header=TRUE" I would have gotten an error.
# also note that if I didn't use "sep=','" I would have gotten an error
# because "read.table" assumes that the values are separated by tabs unless
# "sep" is set to another delimitor.
dd<-read.table(myfile, header=TRUE, sep=',')
# wow, that's a lot of data! 510637 observations of 93 variables.
str(dd)
## 'data.frame': 510637 obs. of 93 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ STATION : Factor w/ 1 level "WBAN:94236": 1 1 1 1 1 1 1 1 1 1 ...
## $ STATION_NAME : Factor w/ 1 level "KLAMATH FALLS INTERNATIONAL AIRPORT OR US": 1 1 1 1 1 1 1 1 1 1 ...
## $ ELEVATION : num 1245 1245 1245 1245 1245 ...
## $ LATITUDE : num 42.1 42.1 42.1 42.1 42.1 ...
## $ LONGITUDE : num -122 -122 -122 -122 -122 ...
## $ DATE : Factor w/ 510637 levels "1959-09-01 01:00",..: 407242 407243 407244 407245 407246 407247 407248 407249 407250 407251 ...
## $ REPORTTPYE : Factor w/ 6 levels "AUTO","FM-15",..: 2 2 2 2 2 3 2 3 2 3 ...
## $ HOURLYSKYCONDITIONS : Factor w/ 19031 levels "","*","* SCT:04 100 BKN:07 200",..: 4677 13126 13120 13111 13109 13106 13104 3148 3148 16601 ...
## $ HOURLYVISIBILITY : Factor w/ 153 levels "","*","0","0.00",..: 58 58 58 58 58 58 58 58 58 58 ...
## $ HOURLYPRSENTWEATHERTYPE : Factor w/ 373 levels "","-DZ:01 |DZ:51 |",..: 1 1 348 1 348 348 13 14 14 13 ...
## $ HOURLYDRYBULBTEMPF : Factor w/ 195 levels "","-1","-10",..: 55 59 61 63 65 63 65 66 66 71 ...
## $ HOURLYDRYBULBTEMPC : Factor w/ 462 levels "","-0.2","-0.5",..: 73 9 4 167 171 167 171 173 175 354 ...
## $ HOURLYWETBULBTEMPF : Factor w/ 96 levels "","-1","-10",..: 46 48 50 51 52 51 52 52 53 56 ...
## $ HOURLYWETBULBTEMPC : Factor w/ 512 levels "","-0.1","-0.2",..: 136 17 14 8 2 6 289 290 292 420 ...
## $ HOURLYDewPointTempF : Factor w/ 165 levels "","-1","-10",..: 72 76 76 78 81 81 83 81 83 89 ...
## $ HOURLYDewPointTempC : Factor w/ 399 levels "","-0.1","-0.2",..: 156 103 103 19 12 10 6 10 6 252 ...
## $ HOURLYRelativeHumidity : Factor w/ 97 levels "","*","10","100",..: 91 91 87 86 87 92 91 85 87 85 ...
## $ HOURLYWindSpeed : Factor w/ 61 levels "","0","1","10",..: 2 2 28 2 51 28 2 28 2 12 ...
## $ HOURLYWindDirection : Factor w/ 62 levels "","0","000","010",..: 3 3 18 3 16 7 3 14 3 36 ...
## $ HOURLYWindGustSpeed : int NA NA NA NA NA NA NA NA NA NA ...
## $ HOURLYStationPressure : Factor w/ 170 levels "","24.86","24.87",..: 111 109 109 108 106 105 105 105 107 105 ...
## $ HOURLYPressureTendency : int 6 NA NA 8 NA NA NA NA 5 NA ...
## $ HOURLYPressureChange : Factor w/ 67 levels "","-0.00","-0.01",..: NA NA NA NA NA NA NA NA NA NA ...
## $ HOURLYSeaLevelPressure : Factor w/ 213 levels "","28.91","28.92",..: 134 130 131 128 127 1 126 1 128 1 ...
## $ HOURLYPrecip : Factor w/ 53 levels "","0.00","0.01",..: 53 2 53 53 53 53 3 3 3 1 ...
## $ HOURLYAltimeterSetting : Factor w/ 209 levels "","28.91","28.92",..: 138 135 136 134 131 130 130 130 132 130 ...
## $ DAILYMaximumDryBulbTemp : Factor w/ 99 levels "","100","101",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ DAILYMinimumDryBulbTemp : Factor w/ 119 levels "","-1","-10",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ DAILYAverageDryBulbTemp : Factor w/ 126 levels "","-2","-6","-7",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ DAILYDeptFromNormalAverageTemp : Factor w/ 460 levels "","-0.1","-0.1s",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ DAILYAverageRelativeHumidity : logi NA NA NA NA NA NA ...
## $ DAILYAverageDewPointTemp : logi NA NA NA NA NA NA ...
## $ DAILYAverageWetBulbTemp : logi NA NA NA NA NA NA ...
## $ DAILYHeatingDegreeDays : Factor w/ 105 levels "","0","0s","1",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ DAILYCoolingDegreeDays : Factor w/ 25 levels "","0","0s","1",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ DAILYSunrise : int 736 736 736 736 736 736 736 736 736 736 ...
## $ DAILYSunset : int 1646 1646 1646 1646 1646 1646 1646 1646 1646 1646 ...
## $ DAILYWeather : Factor w/ 145 levels "","BR:13","BR:13 FG:01",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ DAILYPrecip : Factor w/ 112 levels "","0.00","0.00s",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ DAILYSnowfall : int NA NA NA NA NA NA NA NA NA NA ...
## $ DAILYSnowDepth : int NA NA NA NA NA NA NA NA NA NA ...
## $ DAILYAverageStationPressure : num NA NA NA NA NA NA NA NA NA NA ...
## $ DAILYAverageSeaLevelPressure : logi NA NA NA NA NA NA ...
## $ DAILYAverageWindSpeed : num NA NA NA NA NA NA NA NA NA NA ...
## $ DAILYPeakWindSpeed : Factor w/ 96 levels "","10","11","12",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ PeakWindDirection : Factor w/ 72 levels "","010","020",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ DAILYSustainedWindSpeed : Factor w/ 56 levels "","10","10s",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ DAILYSustainedWindDirection : Factor w/ 55 levels "","010","020",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ MonthlyMaximumTemp : num NA NA NA NA NA NA NA NA NA NA ...
## $ MonthlyMinimumTemp : num NA NA NA NA NA NA NA NA NA NA ...
## $ MonthlyMeanTemp : num NA NA NA NA NA NA NA NA NA NA ...
## $ MonthlyAverageRH : logi NA NA NA NA NA NA ...
## $ MonthlyDewpointTemp : logi NA NA NA NA NA NA ...
## $ MonthlyWetBulbTemp : logi NA NA NA NA NA NA ...
## $ MonthlyAvgHeatingDegreeDays : logi NA NA NA NA NA NA ...
## $ MonthlyAvgCoolingDegreeDays : logi NA NA NA NA NA NA ...
## $ MonthlyStationPressure : num NA NA NA NA NA NA NA NA NA NA ...
## $ MonthlySeaLevelPressure : num NA NA NA NA NA NA NA NA NA NA ...
## $ MonthlyAverageWindSpeed : logi NA NA NA NA NA NA ...
## $ MonthlyTotalSnowfall : logi NA NA NA NA NA NA ...
## $ MonthlyDeptFromNormalMaximumTemp : num NA NA NA NA NA NA NA NA NA NA ...
## $ MonthlyDeptFromNormalMinimumTemp : num NA NA NA NA NA NA NA NA NA NA ...
## $ MonthlyDeptFromNormalAverageTemp : num NA NA NA NA NA NA NA NA NA NA ...
## $ MonthlyDeptFromNormalPrecip : Factor w/ 145 levels "","-0.01s","-0.03s",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ MonthlyTotalLiquidPrecip : Factor w/ 135 levels "","0.00","0.00s",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ MonthlyGreatestPrecip : logi NA NA NA NA NA NA ...
## $ MonthlyGreatestPrecipDate : logi NA NA NA NA NA NA ...
## $ MonthlyGreatestSnowfall : logi NA NA NA NA NA NA ...
## $ MonthlyGreatestSnowfallDate : logi NA NA NA NA NA NA ...
## $ MonthlyGreatestSnowDepth : logi NA NA NA NA NA NA ...
## $ MonthlyGreatestSnowDepthDate : logi NA NA NA NA NA NA ...
## $ MonthlyDaysWithGT90Temp : Factor w/ 26 levels "","0","0s","1",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ MonthlyDaysWithLT32Temp : Factor w/ 23 levels "","0","0s","1",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ MonthlyDaysWithGT32Temp : Factor w/ 37 levels "","0","1","10",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ MonthlyDaysWithLT0Temp : int NA NA NA NA NA NA NA NA NA NA ...
## $ MonthlyDaysWithGT001Precip : logi NA NA NA NA NA NA ...
## $ MonthlyDaysWithGT010Precip : logi NA NA NA NA NA NA ...
## $ MonthlyDaysWithGT1Snow : logi NA NA NA NA NA NA ...
## $ MonthlyMaxSeaLevelPressureValue : logi NA NA NA NA NA NA ...
## $ MonthlyMaxSeaLevelPressureDate : int -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 ...
## $ MonthlyMaxSeaLevelPressureTime : int -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 ...
## $ MonthlyMinSeaLevelPressureValue : logi NA NA NA NA NA NA ...
## $ MonthlyMinSeaLevelPressureDate : int -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 ...
## $ MonthlyMinSeaLevelPressureTime : int -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 ...
## $ MonthlyTotalHeatingDegreeDays : Factor w/ 153 levels "","*","0s","1",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ MonthlyTotalCoolingDegreeDays : Factor w/ 66 levels "","0","1","10",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ MonthlyDeptFromNormalHeatingDD : Factor w/ 132 levels "","-1","-105",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ MonthlyDeptFromNormalCoolingDD : Factor w/ 62 levels "","-1","-10",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ MonthlyTotalSeasonToDateHeatingDD: logi NA NA NA NA NA NA ...
## $ MonthlyTotalSeasonToDateCoolingDD: logi NA NA NA NA NA NA ...
## $ Date : Factor w/ 510637 levels "(01/01/00 23:59:00)",..: 193 194 195 196 197 198 199 200 201 202 ...
## $ Month : Factor w/ 12 levels "Apr","Aug","Dec",..: 5 5 5 5 5 5 5 5 5 5 ...
Notice in the output of “str(dd)” that a lot of variables that seem like they should have been quantitative (i.e. numerical) ended up being “factors” in R this is not ideal since the models we eventually train will be trained different depending on whether or not R thinks the predictors are quantitative or qualitative (i.e. “factors”). Let’s take a closer look at one variable we’re certainly interested in, the temperature in degrees F (HOURLYDRYBULBTEMPF).
There are too many observations to print out all the values of dd$HOURLYDRYBULBTEMPF and if I used “head” to see the first 6, then I am probably not going to see enough values to see what caused the variable to be called a factor in R instead of a numerical variable. The function “unique” is a nice way to remove the redundancies but still be able to see all observed values.
unique(dd$HOURLYDRYBULBTEMPF)
## [1] 28 30 31 32 33 34 37 36 38 41 42 43 39
## [15] 35 29 27 24 22 21 20 19 18 17 16 15 26 23
## [29] 14 13 12 25 40 44 45 49 47 51 46 36s 48 50
## [43] 52 54 59 61 39s 53 56 58 57 55 60 62 * 64
## [57] 66 65 67 63s 68 69 71 73 74 75 77 78 79 80
## [71] 63 70 81 55s 72 82 84 87 86 76 83 79s 82s 73s
## [85] 68s 72s 64s 70s 66s 85 88 89 86s 77s 90 92 91 88s
## [99] 90s 93 94 91s 81s 84s 76s 69s 43s 48s 10 9 6 8
## [113] 1 0 -3 -4 -5 -1 -2 11 4 2 -6 7 3 37s
## [127] 52s 62s 57s 54s 75s 85s 59s 78s 45s 21s 5 30s 53s 34s
## [141] 95 80s 3s 32s 96 98 99 97 71s 60s 58s 19s -11 -7
## [155] -9s -10 -9 -12 -13 -15 -14 -17 -18 -20 -8 44s 120s 42s
## [169] 83s 92s 61s 65s 51s 56s 38s -16 -19 67s 94s 87s 89s 31s
## [183] 41s 46s 28s 49s 100 102 40s 101 103 -23 -25 -22 47s
## 195 Levels: -1 -10 -11 -12 -13 -14 -15 -16 -17 -18 -19 -2 -20 -22 ... 99
Hmm, it looks like the problem is that some temperatures have “s” as a postfix, like “69s” at the 106th place in the above output. What does a postfix “s” mean? We’d better read the documentation. On page 7, we see that “s” means a “suspect” value.
So, we can proceed by either including or excluding the “suspect” values. Let’s say we decide to remove them. Below is some R code for doing so, for the four variables I decided to use (“HOURLYDRYBULBTEMPC”, “HOURLYSeaLevelPressure”, “HOURLYRelativeHumidity”, and “HOURLYWindSpeed”). Additionally, I’ll need the date the information was recorded (“DATE”). Before I start to clean the dataset too much, I’ll reduce the size by only selecting May 18th and May 16th so that I can predict the temperature on the 18th from the 16th’s information. By reducing the size of the dataset, I can make the cleaning steps (removing “s”) go faster.
# as a first step let's re-read in the data, this time telling R to use the class
# "character" for variables that are not purely numerical.
dd<-read.table(myfile, sep=',', header=TRUE, stringsAsFactors=FALSE)
# these are the variables I decided to try in my models:
# you might want to include a few more or less
mydd<-dd[,c( "DATE",
"HOURLYDRYBULBTEMPC",
"HOURLYSeaLevelPressure",
"HOURLYRelativeHumidity",
"HOURLYWindSpeed")]
str(mydd)
## 'data.frame': 510637 obs. of 5 variables:
## $ DATE : chr "2009-01-01 00:53" "2009-01-01 01:53" "2009-01-01 02:53" "2009-01-01 03:53" ...
## $ HOURLYDRYBULBTEMPC : chr "-2.2" "-1.1" "-0.6" "0.0" ...
## $ HOURLYSeaLevelPressure: chr "30.13" "30.10" "30.11" "30.09" ...
## $ HOURLYRelativeHumidity: chr "92" "92" "89" "88" ...
## $ HOURLYWindSpeed : chr "0" "0" "3" "0" ...
Notice that now the first variable in “mydd” is DATE, and it is in the format “year-month-day hour:minute”. I could split up this string into the various parts of a date by splitting up the characters into “year”, “month” etc. in order of appearance. But there’s a more powerful way to deal with dates in R. If our variable only had dates (and no times), we could cast into a “Date” object. Since we have both the date and time in a single string, we’ll cast the DATE variable into a “chron” object from the “chron” library.
## 'data.frame': 510637 obs. of 5 variables:
## $ DATE :Classes 'chron', 'dates', 'times' atomic [1:510637] 14245 14245 14245 14245 14245 ...
## .. ..- attr(*, "format")= Named chr [1:2] "m/d/y" "h:m:s"
## .. .. ..- attr(*, "names")= chr [1:2] "dates" "times"
## .. ..- attr(*, "origin")= Named num [1:3] 1 1 1970
## .. .. ..- attr(*, "names")= chr [1:3] "month" "day" "year"
## $ HOURLYDRYBULBTEMPC : chr "-2.2" "-1.1" "-0.6" "0.0" ...
## $ HOURLYSeaLevelPressure: chr "30.13" "30.10" "30.11" "30.09" ...
## $ HOURLYRelativeHumidity: chr "92" "92" "89" "88" ...
## $ HOURLYWindSpeed : chr "0" "0" "3" "0" ...
## [1] (09/01/59 01:00:00) (04/29/18 23:59:00)
##
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 46950 41381 44821 42436 42104 39910 40320 40480 40258 42966 42937 46074
## [1] 42104 5
## [1] 1390 5
## [1] 1356 5
## [1] 1557 11
## [1] 56
## [1] 36
## [1] 52
## [1] 2000 2001 2002 2003
## 56 Levels: 1960 < 1961 < 1962 < 1963 < 1964 < 1965 < 1966 < ... < 2017
## [1] (05/18/00 23:59:00) (05/18/01 23:59:00) (05/18/02 23:59:00)
## [4] (05/18/03 23:59:00)
## [1] 52 5
## [1] 52 10
## [1] 52 10
## 'data.frame': 52 obs. of 10 variables:
## $ year : Ord.factor w/ 56 levels "1960"<"1961"<..: 1 2 3 4 5 6 7 8 9 10 ...
## $ hour : num 11 11 11 11 11 11 11 11 11 11 ...
## $ min : num 0 0 0 0 0 0 0 0 0 0 ...
## $ DATE.x :Classes 'chron', 'dates', 'times' atomic [1:52] -3515 -3150 -2785 -2420 -2054 ...
## .. ..- attr(*, "format")= Named chr [1:2] "m/d/y" "h:m:s"
## .. .. ..- attr(*, "names")= chr [1:2] "dates" "times"
## .. ..- attr(*, "origin")= Named num [1:3] 1 1 1970
## .. .. ..- attr(*, "names")= chr [1:3] "month" "day" "year"
## $ HOURLYDRYBULBTEMPC.x : chr "7.2" "18.3" "15.6" "21.7" ...
## $ DATE.y :Classes 'chron', 'dates', 'times' atomic [1:52] -3517 -3152 -2787 -2422 -2056 ...
## .. ..- attr(*, "format")= Named chr [1:2] "m/d/y" "h:m:s"
## .. .. ..- attr(*, "names")= chr [1:2] "dates" "times"
## .. ..- attr(*, "origin")= Named num [1:3] 1 1 1970
## .. .. ..- attr(*, "names")= chr [1:3] "month" "day" "year"
## $ HOURLYDRYBULBTEMPC.y : chr "8.9" "16.1" "12.2" "17.8" ...
## $ HOURLYSeaLevelPressure: chr "30.07" "29.94" "30.03" "30.17" ...
## $ HOURLYRelativeHumidity: chr "34" "34" "49" "40" ...
## $ HOURLYWindSpeed : chr "12" "6" "10" "0" ...
## 'data.frame': 52 obs. of 10 variables:
## $ year : Ord.factor w/ 56 levels "1960"<"1961"<..: 1 2 3 4 5 6 7 8 9 10 ...
## $ hour : num 11 11 11 11 11 11 11 11 11 11 ...
## $ min : num 0 0 0 0 0 0 0 0 0 0 ...
## $ dateMay18 :Classes 'chron', 'dates', 'times' atomic [1:52] -3515 -3150 -2785 -2420 -2054 ...
## .. ..- attr(*, "format")= Named chr [1:2] "m/d/y" "h:m:s"
## .. .. ..- attr(*, "names")= chr [1:2] "dates" "times"
## .. ..- attr(*, "origin")= Named num [1:3] 1 1 1970
## .. .. ..- attr(*, "names")= chr [1:3] "month" "day" "year"
## $ tempMay18 : chr "7.2" "18.3" "15.6" "21.7" ...
## $ dateMay16 :Classes 'chron', 'dates', 'times' atomic [1:52] -3517 -3152 -2787 -2422 -2056 ...
## .. ..- attr(*, "format")= Named chr [1:2] "m/d/y" "h:m:s"
## .. .. ..- attr(*, "names")= chr [1:2] "dates" "times"
## .. ..- attr(*, "origin")= Named num [1:3] 1 1 1970
## .. .. ..- attr(*, "names")= chr [1:3] "month" "day" "year"
## $ tempMay16 : chr "8.9" "16.1" "12.2" "17.8" ...
## $ HOURLYSeaLevelPressure: chr "30.07" "29.94" "30.03" "30.17" ...
## $ HOURLYRelativeHumidity: chr "34" "34" "49" "40" ...
## $ HOURLYWindSpeed : chr "12" "6" "10" "0" ...
Now that we have all the variables I wanted, for the specific days and times that I wanted, I can try to convert the variables that are currently “chr” to class “num”. First let’s note any “suspect” values and change them to “NA”.
# read about a few of the string operations in R: ?grep
# if a temperature is "suspect", mark it as "NA"
for(i in 1:nrow(May)) { # for each row
for(j in c(5, 7, 8,9,10)) { # for each column may have an "s"
# if a value contains "s" then replace it with "NA"
# if not, then keep the value
May[i,j]<-ifelse(grepl(May[i, j], "s"), "NA", May[i,j] )
}
}
# now that "s" are removed, it's safe to cast the character strings to numbers.
May$tempMay18<-as.numeric(May$tempMay18)
May$tempMay16<-as.numeric(May$tempMay16)
May$HOURLYSeaLevelPressure<-as.numeric(May$HOURLYSeaLevelPressure)
## Warning: NAs introduced by coercion
May$HOURLYRelativeHumidity<-as.numeric(May$HOURLYRelativeHumidity)
May$HOURLYWindSpeed<-as.numeric(May$HOURLYWindSpeed )
str(May)
## 'data.frame': 52 obs. of 10 variables:
## $ year : Ord.factor w/ 56 levels "1960"<"1961"<..: 1 2 3 4 5 6 7 8 9 10 ...
## $ hour : num 11 11 11 11 11 11 11 11 11 11 ...
## $ min : num 0 0 0 0 0 0 0 0 0 0 ...
## $ dateMay18 :Classes 'chron', 'dates', 'times' atomic [1:52] -3515 -3150 -2785 -2420 -2054 ...
## .. ..- attr(*, "format")= Named chr [1:2] "m/d/y" "h:m:s"
## .. .. ..- attr(*, "names")= chr [1:2] "dates" "times"
## .. ..- attr(*, "origin")= Named num [1:3] 1 1 1970
## .. .. ..- attr(*, "names")= chr [1:3] "month" "day" "year"
## $ tempMay18 : num 7.2 18.3 15.6 21.7 17.8 18.9 18.9 21.1 18.3 17.8 ...
## $ dateMay16 :Classes 'chron', 'dates', 'times' atomic [1:52] -3517 -3152 -2787 -2422 -2056 ...
## .. ..- attr(*, "format")= Named chr [1:2] "m/d/y" "h:m:s"
## .. .. ..- attr(*, "names")= chr [1:2] "dates" "times"
## .. ..- attr(*, "origin")= Named num [1:3] 1 1 1970
## .. .. ..- attr(*, "names")= chr [1:3] "month" "day" "year"
## $ tempMay16 : num 8.9 16.1 12.2 17.8 16.7 16.7 12.8 21.7 16.7 20.6 ...
## $ HOURLYSeaLevelPressure: num 30.1 29.9 30 30.2 29.8 ...
## $ HOURLYRelativeHumidity: num 34 34 49 40 29 41 31 31 29 33 ...
## $ HOURLYWindSpeed : num 12 6 10 0 29 13 0 0 7 5 ...
# if I write May to a csv file then I don't have to run the above code again, I can
# just read in May.csv when I want to train and test models in the next lab.
# write.table("May.csv")
Now I have a cleaned and formatted dataset “May” with which I can train and test models.