PE1AQP's Notes about vCard and .vcf file handling.
Table of Contents
Not really a ham-radio topic, but this is (for me) a convenient web-site to present these ideas.
Introduction
Recently I had reason to copy contact information from some older smartphones to a newer.
I don't like the idea to "save" the my contact information in a cloud operated by some big-tech company.
So I exported this info into a contacts.vcf
file and copied that file onto my main PC.
From there it could be copied onto my new phone.
But it was quickly clear that this .vcf
file needed some cleaning.
It contained duplicate entries with only minor spelling variations of the names, telephone numbers without any further info and even entries for people long since deceased.
I searched the internet for tools to ease the handling of such .vcf
files and came across
this web-page by 'Vermaden'.
Exactly the thing I like, a UNIXy, command-line way of handling .vcf
files.
Short description
For a fuller description, see Vermaden's web-page. But the short of it is:
- A script converts the multi-line vCard entries in a
.vcf
file into sorted single line entries in (what I call) an archive file. - These single line entries can be handled by the usual UNIX cli-tools, such as
grep
to select certain entries orcolumn
to present a nicely formatted table. - A second script can search this archive file for duplicate telephone numbers. The file can then be cleaned with any standard text-editor like
emacs
orvi
. - it makes, in my view, sense to store these archive files in a version control repository like
GIT
orSVN
or even simple local versioning systems likeRCS
. (But, of course a system with a repository under your own control!) - A third script re-assembles the selected and/or corrected archive files back into multi-line
vCards
in a.vcf
file.
Required improvement
I found that I had a few vCard entries from people with German "umlauts" and other non-ASCII characters in their names.
These names appeared as quoted-printable UTF-8
in the .vcf
files generated by the phones.
For example:
BEGIN:VCARD VERSION:2.1 N;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:=47=72=C3=BC=6E=65=62=61=75=6D;=42=61=72=74=68=6F=6C=6F=6D=C3=A4=75=73;;; FN;CHARSET=UTF-8;ENCODING=QUOTED-PRINTABLE:=42=61=72=74=68=6F=6C=6F=6D=C3=A4=75=73=20=47=72=C3=BC=6E=65=62=61=75= =6D TEL;HOME:+493023125010 END:VCARD
No need to worry about privacy: this is a made-up name with a so-called drama telephone number.
My latest phone has the option to export the contacts in vCard version 4. This version specifies that all payload strings are in UTF-8, which is of course also the encoding I use on all my computers.
But older phones only export to vCard version 2.1, which uses this quoted-printable variant of UTF-8.
I required therefore a adaptation of the "from vcf to archive" script that translated quoted-printable texts to UTF-8, but left V4 strings unmodified.
I found that the program qprint
fitted the bill.
It is available in the repositories for most UNIX-like/*BSD/Linux distributions, even when not installed by default.
This program needs to process the .vcf
file as a whole, before any rearrangement of source-lines happens, because the quoted-printable texts can be split over several lines (as happens in the example above).
With the archive file in UTF-8, the script "from archive to vcf" should now generate V4 vCards, and not V2.1 as the original script did. V4 is also a bit stricter about telephone number layout.
Some minor adjustments
I also introduced a few minor adjustments to the scripts, such as normalisation of +
-symbol in international telephone numbers and restoring spaces in comments strings.
My variants of the scripts
Most of the issues mentioned above have been accepted and included, in some form, by the main author.
But of course, shell scripts can be, often are and these certainly are evermore adapted to the user's programming needs; causing local versions to differentiate from their originals.
To reduce confusion, I gave my local, modified copies of Vermaden's three scripts slightly different names.
All three of them can be found in the vcfhandling-<date>.tgz
compressed tar file(s), downloadable below.
The main difference (at time of writing, 2024-08-27) between my local scripts and those in the original repository are:
- Error/warning/info lines are send to stderr,
- Where relevant, use gnu-cc/gnu-emacs compatible layout of error/info/etc. output. Most editors have a "find next error" functionality that use this message layout to jump to the correct source file location.
- The duplicate and consistency checker has been extended to find names without any contact info and to find text in the telephone number field.
- My test data and a test driving makefile are now (2024-08-27) also published.
Unpack that test file in the same directory where the scripts were unpacked and run
make
. (This test data also uses "drama numbers".) - The duplicate checker now also checks for duplicate email-adresses (2024-08-28).
(Clickable) Listing of the relevant files:
$ ls -lAtr -rw-rw-r-- 1 pe1aqp pe1aqp 3006 Aug 24 22:26 vcfhandling-20240824.tgz -rw-rw-r-- 1 pe1aqp pe1aqp 3103 Aug 27 13:39 vcfhandling-20240827.tgz -rw-rw-r-- 1 pe1aqp pe1aqp 3253 Aug 28 12:47 vcfhandling-20240828.tgz -rw-rw-r-- 1 pe1aqp pe1aqp 1921 Aug 28 12:47 vcfhandlingtest-20240828.tgz
Back to
The Fine Print
I'm a fan of the GPL, and in particular the GPLv3, license. But I've only added small bits to these scripts. Therefore:
- The shell-scripts are published under the freebsd license (also known as the 2-clause bsd license) and are mainly copyright of Slawomir Wojciech Wojtczak (also know as "vermaden").
- This web-page is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License.