Some are Boojums--

purpose: Grant's less-than-daily weblog.

Thu, 08 Mar 2007

Lower limit of free software packages in the tree?

Ever wonder what portion of the portage tree is free software, and what portion is proprietary? Here's an estimate.

Total number of packages:

feynman grant> paludis --list-packages --repository gentoo | \
> sed -n -e 's/^\* \(.*\)/\1/p' | wc -l

11540

Total number of packages w/ LICENSE containing [gpl|as-is|bsd]:

feynman grant> paludis --list-packages --repository gentoo | \
> sed -n -e 's/^\* \(.*\)/\1/p' | while read a ; \
> do paludis -qM ${a}::gentoo | grep LICENSE: | \
> cut -d: -f2 | grep -i --quiet '[gpl|as-is|bsd]' \
> &&  echo ${a} ; done | wc -l

11164

So, if I did things right, we're looking at roughly 96.7% of the tree. Thanks to ciaranm for the lengthy one-liner, although any mistakes are definitely my own.

Posted by g2boojum on Thu Mar 8 15:25:54 2007
Thanks to beandog, I now know that I could have just looked at http://spaceparanoids.org/gentoo/gpnl/stats.php?q=license


Posted by Diego Flameeyes Pettenò on Thu Mar 8 15:29:54 2007
Besides the debatable need for using cut, grep should have put outside the while loop, which would have required only one run, and wc -l could have been replaced by the -c switch in grep itself.

For who's interested, I find this command more readable, even if it counts something different, as packages can have different licenses per-version:

pquery --one-attr=license --raw --all --repo=portdir | egrep -i '(gpl|as-is|bsd|mit)' -c

returns 18703 on 22864 ebuilds, which is about 82%

And yes, I added MIT on the license, and it still doesn't show a good result for the quantity of free software; the three or four licenses only cover a part of Free Software licenses, so I wouldn't really claim these stats are truthful, myself.


Posted by Andrew Saunders on Thu Mar 8 18:34:37 2007
Given that stuff like win32codecs falls under the "as-is" umbrella, is it really reasonable to count everything under "as-is" as Free Software?


Posted by Diego Flameeyes Pettenò on Fri Mar 9 08:56:41 2007
It would, if as-is was used consistently.. the classic as-is would be

  Permission to use, copy, modify, and distribute this software and its
  documentation for any purpose and without fee is hereby granted, provided
  that the above copyright notice appears in all copies and that both the
  copyright notice and this permission notice appear in supporting
  documentation, and that the same name not be used in advertising or
  publicity pertaining to distribution of the software without specific,
  written prior permission. We make no representations about the
  suitability this software for any purpose. It is provided "as is"
  without express or implied warranty.


but it is often used for "All rights reserved" software like win32codecs, yes it's an error, as much as it is to use BSD for MIT-licensed stuff (so-called BSD-2), and at the same time for BSD-3 and BSD-4. It's also a mistake to use GPL-2 for both "GNU GPL 2 or later and gnu gpl 2 only"


Posted by brian harring on Fri Mar 9 11:43:12 2007
@diego
pquery --raw --all --one-attr license -n --repo=portdir | grep -i '[bsd|gpl|as-is]' wc

is a bit closer to the original version he tried.


Posted by brian harring on Fri Mar 9 11:46:33 2007
heh... yay for lack of sleep
query --raw --all --one-attr license -n --repo=portdir | egrep -i '(bsd|gpl|as-is)' | wc

is a fair bit more accurate ;)

closer to 78% via that also, since the version grant used does a char check instead of proper text match...


Posted by Diego Flameeyes Pettenò on Fri Mar 9 14:09:39 2007
Still the wc should have been wc -l (think of || licenses), and using -c to egrep is faster and saves one pipe.


Name:


E-mail:


URL:


Comment:


pyblosxom::1.4.3 01/10/2008

All contents Copyright 2006 Grant Goodyear.
Creative Commons License
This work is licensed under a Creative Commons License.