Introduction

This document will step you through the process of reading in and formatting weather data downloaded from NOAA.

Download dataset(s) from NOAA

I chose to download the “Local Climatological Data” recorded at the Klamath Falls International Airport from 1959 to 2017. You can download weather data from other stations here. Here is the documentation of the LCD. Besides selecting a station to download from, you’ll need specify the dates for which you want the data. I could only download 10 years worth of data at a time and so had to combine the several downloads into one dataset, called “KFweather.csv”. I did this combination in R by reading in each .csv file then using the function “rbind” to combine them and “write.table” to write the result into “KFweather.csv”. You can either download your own data from NOAA or use mine.

Reading the dataset into R

After downloading the data to your computer, try opening it in R with the “read.table” function.

mypath<-"C:\\Users\\rosanna.overholser\\OneDrive - Oregon Institute of Technology\\Courses\\math407\\data\\weather\\"
myfile<-paste(mypath, "KFweather.csv", sep='')

# note that if I didn't use "header=TRUE" I would have gotten an error.
# also note that if I didn't use "sep=','" I would have gotten an error
# because "read.table" assumes that the values are separated by tabs unless
# "sep" is set to another delimitor. 
dd<-read.table(myfile, header=TRUE, sep=',')

# wow, that's a lot of data!  510637 observations of 93 variables.
str(dd)
## 'data.frame':    510637 obs. of  93 variables:
##  $ X                                : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ STATION                          : Factor w/ 1 level "WBAN:94236": 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATION_NAME                     : Factor w/ 1 level "KLAMATH FALLS INTERNATIONAL AIRPORT OR US": 1 1 1 1 1 1 1 1 1 1 ...
##  $ ELEVATION                        : num  1245 1245 1245 1245 1245 ...
##  $ LATITUDE                         : num  42.1 42.1 42.1 42.1 42.1 ...
##  $ LONGITUDE                        : num  -122 -122 -122 -122 -122 ...
##  $ DATE                             : Factor w/ 510637 levels "1959-09-01 01:00",..: 407242 407243 407244 407245 407246 407247 407248 407249 407250 407251 ...
##  $ REPORTTPYE                       : Factor w/ 6 levels "AUTO","FM-15",..: 2 2 2 2 2 3 2 3 2 3 ...
##  $ HOURLYSKYCONDITIONS              : Factor w/ 19031 levels "","*","* SCT:04 100 BKN:07 200",..: 4677 13126 13120 13111 13109 13106 13104 3148 3148 16601 ...
##  $ HOURLYVISIBILITY                 : Factor w/ 153 levels "","*","0","0.00",..: 58 58 58 58 58 58 58 58 58 58 ...
##  $ HOURLYPRSENTWEATHERTYPE          : Factor w/ 373 levels "","-DZ:01 |DZ:51 |",..: 1 1 348 1 348 348 13 14 14 13 ...
##  $ HOURLYDRYBULBTEMPF               : Factor w/ 195 levels "","-1","-10",..: 55 59 61 63 65 63 65 66 66 71 ...
##  $ HOURLYDRYBULBTEMPC               : Factor w/ 462 levels "","-0.2","-0.5",..: 73 9 4 167 171 167 171 173 175 354 ...
##  $ HOURLYWETBULBTEMPF               : Factor w/ 96 levels "","-1","-10",..: 46 48 50 51 52 51 52 52 53 56 ...
##  $ HOURLYWETBULBTEMPC               : Factor w/ 512 levels "","-0.1","-0.2",..: 136 17 14 8 2 6 289 290 292 420 ...
##  $ HOURLYDewPointTempF              : Factor w/ 165 levels "","-1","-10",..: 72 76 76 78 81 81 83 81 83 89 ...
##  $ HOURLYDewPointTempC              : Factor w/ 399 levels "","-0.1","-0.2",..: 156 103 103 19 12 10 6 10 6 252 ...
##  $ HOURLYRelativeHumidity           : Factor w/ 97 levels "","*","10","100",..: 91 91 87 86 87 92 91 85 87 85 ...
##  $ HOURLYWindSpeed                  : Factor w/ 61 levels "","0","1","10",..: 2 2 28 2 51 28 2 28 2 12 ...
##  $ HOURLYWindDirection              : Factor w/ 62 levels "","0","000","010",..: 3 3 18 3 16 7 3 14 3 36 ...
##  $ HOURLYWindGustSpeed              : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ HOURLYStationPressure            : Factor w/ 170 levels "","24.86","24.87",..: 111 109 109 108 106 105 105 105 107 105 ...
##  $ HOURLYPressureTendency           : int  6 NA NA 8 NA NA NA NA 5 NA ...
##  $ HOURLYPressureChange             : Factor w/ 67 levels "","-0.00","-0.01",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ HOURLYSeaLevelPressure           : Factor w/ 213 levels "","28.91","28.92",..: 134 130 131 128 127 1 126 1 128 1 ...
##  $ HOURLYPrecip                     : Factor w/ 53 levels "","0.00","0.01",..: 53 2 53 53 53 53 3 3 3 1 ...
##  $ HOURLYAltimeterSetting           : Factor w/ 209 levels "","28.91","28.92",..: 138 135 136 134 131 130 130 130 132 130 ...
##  $ DAILYMaximumDryBulbTemp          : Factor w/ 99 levels "","100","101",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ DAILYMinimumDryBulbTemp          : Factor w/ 119 levels "","-1","-10",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ DAILYAverageDryBulbTemp          : Factor w/ 126 levels "","-2","-6","-7",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ DAILYDeptFromNormalAverageTemp   : Factor w/ 460 levels "","-0.1","-0.1s",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ DAILYAverageRelativeHumidity     : logi  NA NA NA NA NA NA ...
##  $ DAILYAverageDewPointTemp         : logi  NA NA NA NA NA NA ...
##  $ DAILYAverageWetBulbTemp          : logi  NA NA NA NA NA NA ...
##  $ DAILYHeatingDegreeDays           : Factor w/ 105 levels "","0","0s","1",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ DAILYCoolingDegreeDays           : Factor w/ 25 levels "","0","0s","1",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ DAILYSunrise                     : int  736 736 736 736 736 736 736 736 736 736 ...
##  $ DAILYSunset                      : int  1646 1646 1646 1646 1646 1646 1646 1646 1646 1646 ...
##  $ DAILYWeather                     : Factor w/ 145 levels "","BR:13","BR:13 FG:01",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ DAILYPrecip                      : Factor w/ 112 levels "","0.00","0.00s",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ DAILYSnowfall                    : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ DAILYSnowDepth                   : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ DAILYAverageStationPressure      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ DAILYAverageSeaLevelPressure     : logi  NA NA NA NA NA NA ...
##  $ DAILYAverageWindSpeed            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ DAILYPeakWindSpeed               : Factor w/ 96 levels "","10","11","12",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ PeakWindDirection                : Factor w/ 72 levels "","010","020",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ DAILYSustainedWindSpeed          : Factor w/ 56 levels "","10","10s",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ DAILYSustainedWindDirection      : Factor w/ 55 levels "","010","020",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ MonthlyMaximumTemp               : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MonthlyMinimumTemp               : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MonthlyMeanTemp                  : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MonthlyAverageRH                 : logi  NA NA NA NA NA NA ...
##  $ MonthlyDewpointTemp              : logi  NA NA NA NA NA NA ...
##  $ MonthlyWetBulbTemp               : logi  NA NA NA NA NA NA ...
##  $ MonthlyAvgHeatingDegreeDays      : logi  NA NA NA NA NA NA ...
##  $ MonthlyAvgCoolingDegreeDays      : logi  NA NA NA NA NA NA ...
##  $ MonthlyStationPressure           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MonthlySeaLevelPressure          : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MonthlyAverageWindSpeed          : logi  NA NA NA NA NA NA ...
##  $ MonthlyTotalSnowfall             : logi  NA NA NA NA NA NA ...
##  $ MonthlyDeptFromNormalMaximumTemp : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MonthlyDeptFromNormalMinimumTemp : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MonthlyDeptFromNormalAverageTemp : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MonthlyDeptFromNormalPrecip      : Factor w/ 145 levels "","-0.01s","-0.03s",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ MonthlyTotalLiquidPrecip         : Factor w/ 135 levels "","0.00","0.00s",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ MonthlyGreatestPrecip            : logi  NA NA NA NA NA NA ...
##  $ MonthlyGreatestPrecipDate        : logi  NA NA NA NA NA NA ...
##  $ MonthlyGreatestSnowfall          : logi  NA NA NA NA NA NA ...
##  $ MonthlyGreatestSnowfallDate      : logi  NA NA NA NA NA NA ...
##  $ MonthlyGreatestSnowDepth         : logi  NA NA NA NA NA NA ...
##  $ MonthlyGreatestSnowDepthDate     : logi  NA NA NA NA NA NA ...
##  $ MonthlyDaysWithGT90Temp          : Factor w/ 26 levels "","0","0s","1",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ MonthlyDaysWithLT32Temp          : Factor w/ 23 levels "","0","0s","1",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ MonthlyDaysWithGT32Temp          : Factor w/ 37 levels "","0","1","10",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ MonthlyDaysWithLT0Temp           : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ MonthlyDaysWithGT001Precip       : logi  NA NA NA NA NA NA ...
##  $ MonthlyDaysWithGT010Precip       : logi  NA NA NA NA NA NA ...
##  $ MonthlyDaysWithGT1Snow           : logi  NA NA NA NA NA NA ...
##  $ MonthlyMaxSeaLevelPressureValue  : logi  NA NA NA NA NA NA ...
##  $ MonthlyMaxSeaLevelPressureDate   : int  -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 ...
##  $ MonthlyMaxSeaLevelPressureTime   : int  -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 ...
##  $ MonthlyMinSeaLevelPressureValue  : logi  NA NA NA NA NA NA ...
##  $ MonthlyMinSeaLevelPressureDate   : int  -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 ...
##  $ MonthlyMinSeaLevelPressureTime   : int  -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 -9999 ...
##  $ MonthlyTotalHeatingDegreeDays    : Factor w/ 153 levels "","*","0s","1",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ MonthlyTotalCoolingDegreeDays    : Factor w/ 66 levels "","0","1","10",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ MonthlyDeptFromNormalHeatingDD   : Factor w/ 132 levels "","-1","-105",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ MonthlyDeptFromNormalCoolingDD   : Factor w/ 62 levels "","-1","-10",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ MonthlyTotalSeasonToDateHeatingDD: logi  NA NA NA NA NA NA ...
##  $ MonthlyTotalSeasonToDateCoolingDD: logi  NA NA NA NA NA NA ...
##  $ Date                             : Factor w/ 510637 levels "(01/01/00 23:59:00)",..: 193 194 195 196 197 198 199 200 201 202 ...
##  $ Month                            : Factor w/ 12 levels "Apr","Aug","Dec",..: 5 5 5 5 5 5 5 5 5 5 ...

Notice in the output of “str(dd)” that a lot of variables that seem like they should have been quantitative (i.e. numerical) ended up being “factors” in R this is not ideal since the models we eventually train will be trained different depending on whether or not R thinks the predictors are quantitative or qualitative (i.e. “factors”). Let’s take a closer look at one variable we’re certainly interested in, the temperature in degrees F (HOURLYDRYBULBTEMPF).

There are too many observations to print out all the values of dd$HOURLYDRYBULBTEMPF and if I used “head” to see the first 6, then I am probably not going to see enough values to see what caused the variable to be called a factor in R instead of a numerical variable. The function “unique” is a nice way to remove the redundancies but still be able to see all observed values.

unique(dd$HOURLYDRYBULBTEMPF) 
##   [1] 28   30   31   32   33   34   37   36   38   41   42   43   39       
##  [15] 35   29   27   24   22   21   20   19   18   17   16   15   26   23  
##  [29] 14   13   12   25   40   44   45   49   47   51   46   36s  48   50  
##  [43] 52   54   59   61   39s  53   56   58   57   55   60   62   *    64  
##  [57] 66   65   67   63s  68   69   71   73   74   75   77   78   79   80  
##  [71] 63   70   81   55s  72   82   84   87   86   76   83   79s  82s  73s 
##  [85] 68s  72s  64s  70s  66s  85   88   89   86s  77s  90   92   91   88s 
##  [99] 90s  93   94   91s  81s  84s  76s  69s  43s  48s  10   9    6    8   
## [113] 1    0    -3   -4   -5   -1   -2   11   4    2    -6   7    3    37s 
## [127] 52s  62s  57s  54s  75s  85s  59s  78s  45s  21s  5    30s  53s  34s 
## [141] 95   80s  3s   32s  96   98   99   97   71s  60s  58s  19s  -11  -7  
## [155] -9s  -10  -9   -12  -13  -15  -14  -17  -18  -20  -8   44s  120s 42s 
## [169] 83s  92s  61s  65s  51s  56s  38s  -16  -19  67s  94s  87s  89s  31s 
## [183] 41s  46s  28s  49s  100  102  40s  101  103  -23  -25  -22  47s 
## 195 Levels:  -1 -10 -11 -12 -13 -14 -15 -16 -17 -18 -19 -2 -20 -22 ... 99

Hmm, it looks like the problem is that some temperatures have “s” as a postfix, like “69s” at the 106th place in the above output. What does a postfix “s” mean? We’d better read the documentation. On page 7, we see that “s” means a “suspect” value.

So, we can proceed by either including or excluding the “suspect” values. Let’s say we decide to remove them. Below is some R code for doing so, for the four variables I decided to use (“HOURLYDRYBULBTEMPC”, “HOURLYSeaLevelPressure”, “HOURLYRelativeHumidity”, and “HOURLYWindSpeed”). Additionally, I’ll need the date the information was recorded (“DATE”). Before I start to clean the dataset too much, I’ll reduce the size by only selecting May 18th and May 16th so that I can predict the temperature on the 18th from the 16th’s information. By reducing the size of the dataset, I can make the cleaning steps (removing “s”) go faster.

# as a first step let's re-read in the data, this time telling R to use the class
# "character" for variables that are not purely numerical.

dd<-read.table(myfile, sep=',', header=TRUE, stringsAsFactors=FALSE)

# these are the variables I decided to try in my models: 
# you might want to include a few more or less

mydd<-dd[,c(    "DATE",
                "HOURLYDRYBULBTEMPC",
                "HOURLYSeaLevelPressure",
                "HOURLYRelativeHumidity",
                "HOURLYWindSpeed")]

str(mydd)
## 'data.frame':    510637 obs. of  5 variables:
##  $ DATE                  : chr  "2009-01-01 00:53" "2009-01-01 01:53" "2009-01-01 02:53" "2009-01-01 03:53" ...
##  $ HOURLYDRYBULBTEMPC    : chr  "-2.2" "-1.1" "-0.6" "0.0" ...
##  $ HOURLYSeaLevelPressure: chr  "30.13" "30.10" "30.11" "30.09" ...
##  $ HOURLYRelativeHumidity: chr  "92" "92" "89" "88" ...
##  $ HOURLYWindSpeed       : chr  "0" "0" "3" "0" ...

Notice that now the first variable in “mydd” is DATE, and it is in the format “year-month-day hour:minute”. I could split up this string into the various parts of a date by splitting up the characters into “year”, “month” etc. in order of appearance. But there’s a more powerful way to deal with dates in R. If our variable only had dates (and no times), we could cast into a “Date” object. Since we have both the date and time in a single string, we’ll cast the DATE variable into a “chron” object from the “chron” library.

## 'data.frame':    510637 obs. of  5 variables:
##  $ DATE                  :Classes 'chron', 'dates', 'times'  atomic [1:510637] 14245 14245 14245 14245 14245 ...
##   .. ..- attr(*, "format")= Named chr [1:2] "m/d/y" "h:m:s"
##   .. .. ..- attr(*, "names")= chr [1:2] "dates" "times"
##   .. ..- attr(*, "origin")= Named num [1:3] 1 1 1970
##   .. .. ..- attr(*, "names")= chr [1:3] "month" "day" "year"
##  $ HOURLYDRYBULBTEMPC    : chr  "-2.2" "-1.1" "-0.6" "0.0" ...
##  $ HOURLYSeaLevelPressure: chr  "30.13" "30.10" "30.11" "30.09" ...
##  $ HOURLYRelativeHumidity: chr  "92" "92" "89" "88" ...
##  $ HOURLYWindSpeed       : chr  "0" "0" "3" "0" ...
## [1] (09/01/59 01:00:00) (04/29/18 23:59:00)
## 
##   Jan   Feb   Mar   Apr   May   Jun   Jul   Aug   Sep   Oct   Nov   Dec 
## 46950 41381 44821 42436 42104 39910 40320 40480 40258 42966 42937 46074
## [1] 42104     5
## [1] 1390    5
## [1] 1356    5
## [1] 1557   11
## [1] 56
## [1] 36
## [1] 52
## [1] 2000 2001 2002 2003
## 56 Levels: 1960 < 1961 < 1962 < 1963 < 1964 < 1965 < 1966 < ... < 2017
## [1] (05/18/00 23:59:00) (05/18/01 23:59:00) (05/18/02 23:59:00)
## [4] (05/18/03 23:59:00)
## [1] 52  5
## [1] 52 10
## [1] 52 10
## 'data.frame':    52 obs. of  10 variables:
##  $ year                  : Ord.factor w/ 56 levels "1960"<"1961"<..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ hour                  : num  11 11 11 11 11 11 11 11 11 11 ...
##  $ min                   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ DATE.x                :Classes 'chron', 'dates', 'times'  atomic [1:52] -3515 -3150 -2785 -2420 -2054 ...
##   .. ..- attr(*, "format")= Named chr [1:2] "m/d/y" "h:m:s"
##   .. .. ..- attr(*, "names")= chr [1:2] "dates" "times"
##   .. ..- attr(*, "origin")= Named num [1:3] 1 1 1970
##   .. .. ..- attr(*, "names")= chr [1:3] "month" "day" "year"
##  $ HOURLYDRYBULBTEMPC.x  : chr  "7.2" "18.3" "15.6" "21.7" ...
##  $ DATE.y                :Classes 'chron', 'dates', 'times'  atomic [1:52] -3517 -3152 -2787 -2422 -2056 ...
##   .. ..- attr(*, "format")= Named chr [1:2] "m/d/y" "h:m:s"
##   .. .. ..- attr(*, "names")= chr [1:2] "dates" "times"
##   .. ..- attr(*, "origin")= Named num [1:3] 1 1 1970
##   .. .. ..- attr(*, "names")= chr [1:3] "month" "day" "year"
##  $ HOURLYDRYBULBTEMPC.y  : chr  "8.9" "16.1" "12.2" "17.8" ...
##  $ HOURLYSeaLevelPressure: chr  "30.07" "29.94" "30.03" "30.17" ...
##  $ HOURLYRelativeHumidity: chr  "34" "34" "49" "40" ...
##  $ HOURLYWindSpeed       : chr  "12" "6" "10" "0" ...
## 'data.frame':    52 obs. of  10 variables:
##  $ year                  : Ord.factor w/ 56 levels "1960"<"1961"<..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ hour                  : num  11 11 11 11 11 11 11 11 11 11 ...
##  $ min                   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ dateMay18             :Classes 'chron', 'dates', 'times'  atomic [1:52] -3515 -3150 -2785 -2420 -2054 ...
##   .. ..- attr(*, "format")= Named chr [1:2] "m/d/y" "h:m:s"
##   .. .. ..- attr(*, "names")= chr [1:2] "dates" "times"
##   .. ..- attr(*, "origin")= Named num [1:3] 1 1 1970
##   .. .. ..- attr(*, "names")= chr [1:3] "month" "day" "year"
##  $ tempMay18             : chr  "7.2" "18.3" "15.6" "21.7" ...
##  $ dateMay16             :Classes 'chron', 'dates', 'times'  atomic [1:52] -3517 -3152 -2787 -2422 -2056 ...
##   .. ..- attr(*, "format")= Named chr [1:2] "m/d/y" "h:m:s"
##   .. .. ..- attr(*, "names")= chr [1:2] "dates" "times"
##   .. ..- attr(*, "origin")= Named num [1:3] 1 1 1970
##   .. .. ..- attr(*, "names")= chr [1:3] "month" "day" "year"
##  $ tempMay16             : chr  "8.9" "16.1" "12.2" "17.8" ...
##  $ HOURLYSeaLevelPressure: chr  "30.07" "29.94" "30.03" "30.17" ...
##  $ HOURLYRelativeHumidity: chr  "34" "34" "49" "40" ...
##  $ HOURLYWindSpeed       : chr  "12" "6" "10" "0" ...

Now that we have all the variables I wanted, for the specific days and times that I wanted, I can try to convert the variables that are currently “chr” to class “num”. First let’s note any “suspect” values and change them to “NA”.

# read about a few of the string operations in R: ?grep
# if a temperature is "suspect", mark it as "NA"
for(i in 1:nrow(May)) {    # for each row
  for(j in c(5, 7, 8,9,10)) {  # for each column may have an "s"
   
     # if a value contains "s" then replace it with "NA"
     # if not, then keep the value
     May[i,j]<-ifelse(grepl(May[i, j], "s"), "NA", May[i,j] )
  }
}

# now that "s" are removed, it's safe to cast the character strings to numbers.
May$tempMay18<-as.numeric(May$tempMay18)
May$tempMay16<-as.numeric(May$tempMay16)
May$HOURLYSeaLevelPressure<-as.numeric(May$HOURLYSeaLevelPressure)
## Warning: NAs introduced by coercion
May$HOURLYRelativeHumidity<-as.numeric(May$HOURLYRelativeHumidity)
May$HOURLYWindSpeed<-as.numeric(May$HOURLYWindSpeed )

str(May)
## 'data.frame':    52 obs. of  10 variables:
##  $ year                  : Ord.factor w/ 56 levels "1960"<"1961"<..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ hour                  : num  11 11 11 11 11 11 11 11 11 11 ...
##  $ min                   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ dateMay18             :Classes 'chron', 'dates', 'times'  atomic [1:52] -3515 -3150 -2785 -2420 -2054 ...
##   .. ..- attr(*, "format")= Named chr [1:2] "m/d/y" "h:m:s"
##   .. .. ..- attr(*, "names")= chr [1:2] "dates" "times"
##   .. ..- attr(*, "origin")= Named num [1:3] 1 1 1970
##   .. .. ..- attr(*, "names")= chr [1:3] "month" "day" "year"
##  $ tempMay18             : num  7.2 18.3 15.6 21.7 17.8 18.9 18.9 21.1 18.3 17.8 ...
##  $ dateMay16             :Classes 'chron', 'dates', 'times'  atomic [1:52] -3517 -3152 -2787 -2422 -2056 ...
##   .. ..- attr(*, "format")= Named chr [1:2] "m/d/y" "h:m:s"
##   .. .. ..- attr(*, "names")= chr [1:2] "dates" "times"
##   .. ..- attr(*, "origin")= Named num [1:3] 1 1 1970
##   .. .. ..- attr(*, "names")= chr [1:3] "month" "day" "year"
##  $ tempMay16             : num  8.9 16.1 12.2 17.8 16.7 16.7 12.8 21.7 16.7 20.6 ...
##  $ HOURLYSeaLevelPressure: num  30.1 29.9 30 30.2 29.8 ...
##  $ HOURLYRelativeHumidity: num  34 34 49 40 29 41 31 31 29 33 ...
##  $ HOURLYWindSpeed       : num  12 6 10 0 29 13 0 0 7 5 ...
# if I write May to a csv file then I don't have to run the above code again, I can 
# just read in May.csv when I want to train and test models in the next lab.
# write.table("May.csv")

Now I have a cleaned and formatted dataset “May” with which I can train and test models.