對這文章發表回應
發表限制: 非會員 可以發表
PowerShell Basics #1: Reading and parsing CSV
I will be giving a talk on the topic of “PowerShell for Developers” at TechDays 2010 in Helsinki, Finland. As a warm-up to my presentation, I will be publishing a series of blog posts on various aspects of PowerShell. My viewpoint is specifically one of developer utility: What can PowerShell do to make a dev’s life easier?
I want to start with something that touches on data. Often, developers receive data in Excel format – usually in order to then import it into a database. Reading data from Excel is somewhat painful, but fortunately, Excel allows for easy saving to the CSV format. PowerShell, on the other hand, provides for several quite easy manipulations.
Simple imports with Import-Csv
Let’s start with a simple CSV file, customers.csv:
ID,Name,Country
1,John,United States
2,Beatrice,Germany
3,Jouni,Finland
4,Marcel,France
Turning this into objects in PowerShell is very straightforward:
PS D:\temp> import-csv customers.csv
ID Name Country
-- ---- -------
1 John United States
2 Beatrice Germany
3 Jouni Finland
4 Marcel France
As you can see, the first line in the text file gets parsed as a header, specifying the names on the PowerShell objects. The import doesn’t have a notion of strong typing; therefore, all the properties are imported as pure text. Often this is enough, but if it isn’t, look below…
Headerlessness and other cultures
There are a few scenarios where this won’t work. For example, if your CSV doesn’t have headers, you would get objects with column names such as “1”, “John” and “United States”. Lacking headers, you can supply them as a parameter:
import-csv .\customers.csv -header ID,Name,Country
That was easy (but don’t do it when your data has headers, or you end up duplicating them).
Well then, perhaps you live in a region where the field separator isn’t the usual comma? This is no problem to PowerShell, either
PS D:\temp> type customers-fi.csv
ID;Name;Country
1;John;United States
2;Beatrice;Germany
PS D:\temp> import-csv .\customers-fi.csv -delimiter ';'
ID Name Country
-- ---- -------
1 John United States
2 Beatrice Germany
If you know the file originated from your current UI culture, you can just dispense with the delimiter specification and type import-csv –useCulture customers-fi.csv
. That will pick up the delimiter from your Windows settings.
When your CSV ain’t a file…
Often you get your CSV data in a file. Occasionally, you might download it through HTTP, or even pull it from a database. No matter how, you may end up with an array of strings that contains your CSV data. The Import-Csv cmdlet reads a file, but if you need to parse the data from another source, use ConvertFrom-Csv.
PS D:\temp> $csv = 'ID,Name,Country
>> 1,John,United States
>> 2,Beatrice,Germany'
>>
PS D:\temp> ConvertFrom-Csv $csv
ID Name Country
-- ---- -------
1 John United States
2 Beatrice Germany
As far as the culture switches go, everything discussed above also applies to ConvertFrom-Csv.
How about CSV content oddities?
There are some uses of CSV that veer away from the normal, safe path. The first and a reasonably common scenario is having the field delimiter in the data, something that is usually handled by quoting the field. Of course, up next is the scenario where a field contains the quotation mark.
And finally, there is the really controversial aspect of having a newline in a CSV field. Many parsers struggle with this, and in fact, the correct behavior isn’t exactly clear. Of course, for practical purposes, anything that makes Excel exports work correctly is usually good. But let’s look at an example that contains all of these anomalies (the original content for this is shown in the Excel screenshot to the right).
Number,String,Multiline,Date
512,"Comma,Rocks","Line 1
Line 2",15.1.2010 15:14
57,"Cool ""quotes""","First
Second",7.1.2010 9:33
The data is split across five lines, but actually contains two records and a header. This alone is somewhat controversial, given CSV’s starting point of one line per record. Anyway, it often still needs to be parsed, and PowerShell does a good job here:
PS D:\temp> Import-Csv .\oddstrings.csv
Number String Multiline Date
------ ------ --------- ----
512 Comma,Rocks Line 1... 15.1.2010 15:14
57 Cool "quotes" First... 7.1.2010 9:33
There is one key thing to notice. Import-Csv works, because it treats the data source as a single whole. However, ConvertFrom-Csv misparses the multiline fields, as it handles the input line-by-line.
PS D:\temp> type .\oddstrings.csv | ConvertFrom-Csv
Number String Multiline Date
------ ------ --------- ----
512 Comma,Rocks Line 1
Line 2" 15.1.2010 15:14
57 Cool "quotes" First
Second" 7.1.2010 9:33
Strong typing, then?
For this, there are no pre-cooked solutions. But once your data is imported correctly, separators, multiline fields and all, it’s rather easy to just typecast the stuff, providing you input validation at the same time. Consider this CSV file of event descriptions:
StartsAt,Title,Venue
9.3.2010,TechDays 2010,Helsinki
15.2.2010,Mobile World Congress,Barcelona
Next, you want to filter the data to just show the occurring within the next 30 days. For this, you'll want the datetimes parsed into System.DateTime objects.
PS D:\temp> $events = Import-Csv event.csv | foreach {
New-Object PSObject -prop @{
StartsAt = [DateTime]::Parse($_.StartsAt);
Title = $_.Title;
Venue = $_.Venue
}
}
PS D:\temp> $events
StartsAt Venue Title
-------- ----- -----
9.3.2010 0:00:00 Helsinki TechDays 2010
15.2.2010 0:00:00 Barcelona Mobile World Congress
Now, filtering the list is a snap.
PS D:\temp> $events | where { $_.StartsAt -lt (get-date).AddDays(30) }
StartsAt Venue Title
-------- ----- -----
15.2.2010 0:00:00 Barcelona Mobile World Congress
One more thing to notice here: In the example above, it worked because the date format (“15.2.2010”) was in Finnish and I happened to run the PowerShell in a thread with the Finnish culture. However, if your data happens to come from a culture different than your UI, you need to pass in the correct culture specifier. For example, to parse a date in the US locale, use the following:
$usCulture = [System.Globalization.CultureInfo]::CreateSpecificCulture("en-US")
[DateTime]::Parse("04/07/2009", $usCulture)
Note that specifying the culture explicitly is always a good idea if you plan on saving the script and reusing it later. Although you might pay attention to the parsing details at the time of writing, it is quite conceivable for someone to run the script later on in another thread. As dates are prone to silent misparsing (for example, is 04/07/2009 7th April or 4th July?), you could end up manipulating incorrect data.
In addition to dates, you’ll want to look at cultures when parsing decimal numbers and such. Remember: the –UseCulture switch on Import-Csv only applies to the separator settings, not the optional parsing phase.
Enjoy!
原文出處: Heikniemi Hardcoded » PowerShell Basics #1: Reading and parsing CSV