Merging CSV files from the command line is a practical skill that can streamline your data processing tasks, especially when dealing with multiple datasets. Here’s a guide on how to do it using various command-line tools.
cat
Command in Unix-based SystemsThis command concatenates file1.csv
and
file2.csv
into a new file called
merged.csv
:
cat file1.csv file2.csv > merged.csv
If you have headers in the files and you want to remove repeated headers from subsequent files, use:
cat file1.csv > merged.csv && tail -n +2 file2.csv >> merged.csv
This command ensures that the header from
file1.csv
is kept, but it skips the header from
file2.csv
.
awk
for
More Controlawk
is a powerful text-processing tool, which is
handy for merging files while removing duplicate headers.
awk 'FNR==1 && NR!=1{next;}{print}' *.csv > merged.csv
This command tells awk
to skip the first row of
each file except the first one processed (FNR
is
the record number of the current file, NR
is the
total record number). It merges all CSV files in the directory
into merged.csv
.
csvkit
for Robust CSV Manipulationscsvkit
is a suite of utilities specifically
designed for converting and manipulating CSV files.
Install csvkit
if you haven’t
already:
pip install csvkit
Use csvstack
to merge
files:
csvstack file1.csv file2.csv > merged.csv
csvstack
handles columns and headers
intelligently, stacking rows from multiple files.
For those on Windows, PowerShell provides a method to merge CSV files similar to Unix commands.
Get-Content file1.csv, file2.csv | Select-Object -Unique > merged.csv
This command concatenates file1.csv
and
file2.csv
, and redirects the output to
merged.csv
.
Using command line tools to merge CSV files is efficient and can be adapted to various needs, from simple concatenations to handling large datasets with specific merging rules.