Friday, 19 May 2017

Bioconductor histories with git-svn

If you are developing a software you might be using a version control (If not do it :). Bioconductor until 05/2017 is using svn. However it is migrating to git, meanwhile a hybrid system is provided, where one submits the project through GitHub using git control version system and internally it uses svn. Here are some experiences developing in for Bioconductor in this configuration.

After following the recommendations of the configuration .git/config ends up with:

[core]
    repositoryformatversion = 0
    filemode = true
    bare = false
    logallrefupdates = true
[remote "origin"]
    url = https://github.com/llrs/BioCor.git
    fetch = +refs/heads/*:refs/remotes/origin/*
[branch "master"]
    remote = origin
    merge = refs/heads/master
[remote "bioc"]
    url = https://github.com/Bioconductor-mirror/BioCor.git
    fetch = +refs/heads/*:refs/remotes/bioc/*
[svn-remote "devel"]
    url = https://hedgehog.fhcrc.org/bioconductor//trunk/madman/Rpacks/BioCor
    fetch = :refs/remotes/git-svn-devel
[svn-remote "release-3.5"]
    url = https://hedgehog.fhcrc.org/bioconductor//branches/RELEASE_3_5/madman/Rpacks/BioCor
    fetch = :refs/remotes/git-svn-release-3.5
[branch "release-3.5"]
    remote = bioc
    merge = refs/heads/release-3.5
[branch "devel"]
    remote = bioc
    merge = refs/heads/master

This configuration creates a devel branch forked from Github Bioconductor's mirror, which is equivalent to the devel trunk in svn.
However when I develop my package I do so in master branch which creates the hassle to bring (merge or cherry-pick) the changes (commits) from master to devel branch for later release on Bioconductor or for hot fix in a release-* branch.

Also make sure to try if you have the permissions to write on the repository in Bioconductor







Friday, 31 March 2017

GSEA in Bioconductor


Gene Set Enrichment Analysis is a test thought to find if the position of a group along a list implies some difference. The most know method is the one maintained by the Broad Institute. As it was the first widely used in biology and holds several collection of gene sets. A gene set is a collection of genes related, by either a function or an experiment, it is as fuzzy described as a pathway.

In Bioconductor there is the under used tool of BiocViews, a topic for package classificacion. We can find a category for GSEAs under Software>BiologicalQuestion>GeneSetEnrichment.

This category list 74 packages at the time of writing, which provide function for Gene Set Enrichment Analysis. It will be too long (and to hard for me) to describe all the packages in that category. However, it doesn't include all the packages that perform gene set enrichment.

The first package for GSEA in Bioconductor one should look is GSEABase which provides with tools for reading files from the Broad Institute and translating the Ids of those gene sets.

There are several types of enrichment analysis (EA or simply enrichment), which can be classified by the null hypothesis, between if it is self contained or not, if it uses phenotype, so if it is supervised or unsupervised, and depending on the unit of the enrichment score, if it is for each sample or for all the samples.

And we could further classify them by if they take into account the relationship between the genes, if they take into account the relationship between the gene sets.

I would like to highlight some packages from Bioconductor performing GSEA: limma, GSAR, GSVA, piano, fgsea, and topGO.

From limma I would like to highlight that some of the functions it provides for GSEA are corrected by correlation of expression of the genes in the gene set. The functions are mroast, roast, fry, camera and romer. barcodeplot is the function for plotting the enrichment in that package.


From GSAR package is interesting because most of the methods to do GSEA are graph/network based, interesting functions: WWtest, KStest, MDtest, RKStest, RMDtest, AggrFtest and GSNCAtest. Also the function plotMST2.pathway which allows to visualize network of the Gene sets is interesting.

From GSVA package is interesting the gsva function, which allows to use several methods as zScore, PLAGE and it's own method gsva.

piano package has implemented in R the same algorithm as the one in the Broad Institute and several other methods in the function runGSA.

From fgsea package I highlight the speed of fgsea function and the plotEnrichment function to represent it.

From topGO I highlight that is the one that takes better advantatge of the structure of gene ontologies but it has several bugs (I am trying to improve it here).



Other interesting packages are GAGE, anamiR, PGSEA, EGSEA, GSEAlm, GOseq, SigPathway, ReactomePA, Meshes, EWCE.

Sunday, 19 March 2017

BioCor: My first package in Bioconductor


Yesterday I received an amazing email:

Congratulations, BioCor has been added to Bioconductor!


Yes, I had submitted a package for the Bioconductor project at the beginning of the week.

The package calculates similarities between pathways, genes and clusters of genes based on their pathways. A pathway is a group of functionally related proteins, thus this similarities calculates the functional similarity of the pathway or genes in question.


If anyone is curious what the email had this was in the body (I didn't know what to expect when I knew that it would be accepted):


 Hi Lluís,

Congratulations, BioCor has been added to Bioconductor!
Currently, the definitive location for your Bioconductor package is
in our SVN repository. The following information is to help you in
your role as a package maintainer. You’ll need the following
credentials to maintain your package:

Subversion user ID: myuser
Password: mypassword

Package ‘landing pages’

Every package in Bioconductor gets its own landing page. Contents
from your DESCRIPTION file are pulled out to populate this page. Your
package’s permanent URL is

https://bioconductor.org/packages/BioCor/

This URL will redirect to the release landing page of your package
(and until it’s released, the devel landing page); this is the URL
that should be used (in publications, etc.) to refer to your package.
You can also refer specifically to the devel, release, or specific
numbered version of Bioconductor:

https://bioconductor.org/packages/devel/BioCor/
https://bioconductor.org/packages/release/BioCor/
https://bioconductor.org/packages/3.5/BioCor/

Maintaining your package

See
http://bioconductor.org/developers/how-to/source-control#experiment-data-packages
for special instructions relating to ExperimentData package
maintenance.

Bioconductor currently maintains software packages in ‘release’ and
‘devel’ branches of a subversion (svn) repository.

The release branch is meant for end-users. A new release branch is
created once very 6 months, in April and October. At the release, the
current devel version of your package becomes the release
version. Only ‘bug fixes’ are made to the release branch. Since your
package has not gone through a release cycle, you do not yet have a
release branch — your package is only available to users of Bioc
‘devel’.

The devel branch is where new packages are added, and where new
features are added to existing packages. Your package has been added
to the devel branch, and is available immediately to those
Bioconductor users who have chosen to ‘use devel’

Make any changes to the devel branch, and watch the release schedule
http://bioconductor.org/developers/release-schedule/ for details of
the next release.

At the next release, your package code in the devel branch will be
become the release version of your package. The release version number
will be changed to 1.0.0. The code in the devel branch will continue,
but with version 1.1.0. If necessary, you’ll continue adding features
or updating your package in the devel branch, creating versions 1.1.1,
1.1.2, …; you’ll port bug fixes (NOT new features, or any change to
the ‘API’ seen by users!) to the release branch, creating versions
1.0.1, 1.0.2, …

This process will repeat at the next release, where the version of
your package available in devel will become version 1.2.0, and the
devel branch will continue with version 1.3.0.

Subversion

Bioconductor packages are maintained under Subversion source
control. Use Subversion (or git, described below) to update your
package; see our short svn guide:

http://bioconductor.org/developers/how-to/source-control/

Your subversion account credentials are at the top of this email, or
are already known to you. The credentials give you read access to the
whole Bioconductor repository and WRITE permissions to the devel
(and eventually release) version of your package.

To update your package in the devel branch, you need to do the
following steps:

a) Install subversion(svn) on your machine, if it is not already installed.

b) Use the following command to checkout your packages files from the
Bioconductor subversion repository.

svn co --username myuser --password XXXXXXXXX
https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/BioCor

c) Make the necessary changes to your package.

d) Bump the version from x.y.z to x.y.(z+1) in your package’s
DESCRIPTION file. If the version is not properly changed your
changes will not be made available in the public repository.

e) Build your package tar ball

R CMD build BioCor

f) Check that the changes have produced a package consistent with R’s
check facility

R CMD check BioCor_x.y.(z+1).tar.gz

g) Fix any Warnings or Errors from step (e) and (f)

h) Check the updated source code in to Subversion

svn ci BioCor

g) Check the build report next day (see point 3 for details)

Please let me know if you have any questions or issues with your SVN
access.

Remember, all new features and bug fixes are made to devel branch of
your package; only tested bug fixes should be ported to the release
branch. When testing your changes, be sure to use the ‘devel’ version
of Bioconductor (http://bioconductor.org/developers/how-to/useDevel)
and the appropriate version of R.

Git and github mirrors

If you prefer to use Git and / or GitHub instead of Subversion, you
can use the Bioconductor Git mirrors which are documented at
http://bioconductor.org/developers/how-to/git-mirror/

Build report

When you make a change to the devel branch of your package, please
remember to bump the version of your package in the DESCRIPTION FILE.
Everyday at around 5pm PST, the build system takes a snapshot of all
the packages inside Bioconductor and then the next day after 12 noon
PST http://bioconductor.org/checkResults/ is created containing the
output of R CMD build and check on all platforms for each package.

When reading the above, please pay attention to the date displayed
next to - “Snapshot Date:” and “This page was generated on”. Please
keep an eye on the build/check daily reports for the Bioconductor
devel packages: http://bioconductor.org/checkResults/ and promptly
address any warnings or errors from your packages build report.

RSS feeds:

You can find the RSS feed for software packages at
http://bioconductor.org/rss/build/packages/BioCor.rss

Using the support site and bioc-devel mailing list

Please be sure that you have registered on the support site
https://support.bioconductor.org/accounts/login/. Subscribe to the tag
corresponding to your package by editing your user profile to include
the package name in the ‘My Tags’ field. This way, you will be
notified when someone is asking a question about your package. Please
respond promptly to bug reports or user questions on the support site.
We recommend that you ‘follow’ tags that match your own package (such
as your package name) so that

Please maintain your subscription to the Bioc-devel mailing, so that
you are aware of Bioconductor project and other developments,
http://bioconductor.org/help/support/#bioc-devel. Also, after your
package has passed the build report’s CHECK test for the first time,
you may send a note to Bioc-devel to announce its public availability
(with a short description) so other developers are aware of it.

Updating maintainer status

If for some reason, your email address changes, please update the
maintainer field in your DESCRIPTION file. We may need to reach you if
there are issues building your package (this could happen as a result
of changes to R or to packages you depend on). If we are unable to
contact you for a period of time, we may be forced to remove your
package from Bioconductor.

If you want to add a new maintainer or transfer responsibility to
someone else, please email us at packages@bioconductor.org and clearly
state the new maintainers name, email address and CC them on the
email.

If you no longer want to maintain your package, please let us know and
we will remove it from Bioconductor, or (with your permission) find
a new maintainer for it. See
http://bioconductor.org/developers/package-end-of-life/

Helpful things to know about Bioconductor

Developer resources: http://bioconductor.org/developers

Bioconductor Newsletter: http://bioconductor.org/help/newsletters/

Upcoming Courses: http://bioconductor.org/help/events/

Course Material: http://bioconductor.org/help/course-materials/

Twitter: https://twitter.com/Bioconductor

Thank you for contributing to the Bioconductor project!

Martin Morgan

Many thanks for accepting the package! I hope you'll find it useful.