Difference between revisions of "Obj1 25-01-06 miknotes"
From Organic Design wiki
m |
m (Protected "Obj1 25-01-06 miknotes" ([read=sysop] (indefinite))) |
(No difference)
|
Latest revision as of 20:03, 11 December 2010
Contents
Notes taken from NuNZ objective one meeting 25 Jan 2006
Piete
- Projected timeline for importing all current data: ~mid-April.
- Lynn is very keen for workflow defining how experimental data gets to the database.
- Spreadsheet layout has been an issue - needs to be standardized so that Al can easily parse stuff into the database.
- More pressure on team leaders to get data submitted once it is complete.
- Mik: can we get "completion" as an option in the database - i.e., can we have data stored that is not currently retrievable?
- Require data upload as criterion for experiment completion?
- Are we able to get detailed work plans on the wiki which document experiment status, projected completion date, actual completion date etc?
- Currently need 1) pre-agreed format for results, 2) ftp upload of data, 3) email to Piete informing him of the upload, 4) parsing into the database via appropriate scripts.
- What happens when experiments are redone?
- Need to have system for tracking problems with data - e.g., comments linked to files in the database.
- Chris T: need revision tags (forward and backwards) to see why various experiments have been redone.
- Lynn: do we need to be looking at the results for different food components blind? i.e., worried about introducing bias when analysing "favourite" fruit.
- Wiki discussions
- Need to sort out structire.
- New users guide.
- Agreed protocols for entering things.
- How to handle PDFs etc?
- link to full text on journal website.
- Parsers for MS office documents?
- Minutes for meetings can be locked after everyone has agreed (really necessary?)
- Also see William's point below.
Al
- Pointed out that you can http link straight into the database to avoid having to go through the database front page.
- Can add comments (public - everyone, private - just you) for a particular page.
- Can also link to things (spreadsheets etc) on a local drive.
- William - when you upload experimental data, you should add a link to the discussion page in the wiki for discussion of this data.
- Need to sort out standard format for this.
- Marcus and I need to get the analyzed data onto the NuNZ database.
- Need to finalize what the format will be.
- On the upload section of the database there is the option to notify people when the data is uploaded. This solves the problem of notifying Marcus and I that an analysis needs to be performed. I think this is format dependent though - sometimes people will just upload via ftp and then rely on Al to parse the data into the database "manually".
- Database is open source (postgresql + python) so can easily to dowloaded and set up on a local machine.
- Marcus: could use wiki markup to do the link out in the database so that links can 1) direct to wiki pages (using the wiki page name) or 2) written so that the full URL can be 'disguised' (e.g., google, rather than http://www.google.co.nz). Particularly good for NCBI links which are really long. Talked to Al later - can already give the link a short name.
Chris T
- Aside - vitamin D deficiency due to lack of exposure to ultra-violet light.
- Creating sick children (babies) etc.
- Sunblock obviously blocks UV.
- UC Davis
- Jim Kaput tasked by NUGO to coordinate databases across studies groups, labs etc.
- Recognition that we need very large studies to solve complex problems.
- Need global standard (e.g., MIAME) so that linking into other peoples databases leads to the retrieval of usable data.
- Not going to have a single repository. But microarrays DO - NCBI, oncomine etc.
- Jim Kaput tasked by NUGO to coordinate databases across studies groups, labs etc.
- Data analayis developments
- For every new technology everyone has to rediscover basic statistical principals - design replication etc.
- Current favourite is Mass Spec data.
- Do you look at entire spectra, or focus on changes in individual peaks between two samples/treatments etc?
- Major variation in protein spectra based on food intake, time of day, etc.
Mik/Marcus
- Genespring
- License issue - can NuNZ members from other organizations legally use the software?
- Lynn to contact Phillip Lindsay to figure this out.
- If so, THEN we can think about NuNZ genespring training sessions.
- Marcus and I would probably like to have a look at genespring at some point to see if there are any features we like.
- Automated diagnostics
- Want to be able to produce diagnostic plots for a collection of microarrays when they are entered into the database.
- Provides instant feedback
- Need to refine R scripts so that they can be run on the webserver.
- Would it be better to write a binary executable (to produce the diagnostic plots) which could be run locally by the person performing the experiment?
- I say no: if anyone else needs to look at the plots, they have to be emailed around.
- Also, it means that the data may never make it to the database, particularly if the experiment gets redone, based on the results of the diagnostic plots.
- Plus, there is not record (except on the local machine of the person doing the experiment) of what the diagnostic plots revealed, nor in fact any central record that the data were even produced.
- Want to be able to produce diagnostic plots for a collection of microarrays when they are entered into the database.
- Need to get documents onto the wiki.
- R code - how to get the R code on the wiki to be able to be used in the package building process? i.e., can I issue a shell command which will build a package from the current wiki code?
- What happens if there is an error in the code?
- Need to record when packages were last successfully built so that we can track back to the most recent compilable version.
- Probably every time we alter the code we'll need to also build the package (and run a test suite) to ensure that it is behaving in an acceptable way.
- Need to have a development version of the code, and a stable version. All of the activity discussed above will relate to the development version (since the stable version will already work, and will not be being edited).
- Workflow documentation so that users can interactively edit the workflow, make suggestions/requests etc.
- R code - how to get the R code on the wiki to be able to be used in the package building process? i.e., can I issue a shell command which will build a package from the current wiki code?
- What pathway/function based analysis methods are going to be applied to the results?
- We can do stuff in Bioconductor.
- Others probably have their own favourite tools.
- Is this an analysis component that needs to be done at the same time as the toptable stuff?
- What other options are there for these types of analysis?
=== Martin: === High-throughput screen.
- Sounds like a lot of data needs to be analysed.
- Currently Martin is doing things "manually", but it sounds like there is plenty of scope for automation of this process.
- Chris T and Brian will probably handle this stuff.
- Issues involving replication (expensive) and randomization....
- Currently the initial unreplicated screen at different dilutions (of salicylic acid) is used to raise flags, with things that look promising then investigated further.
- Should also have support based on neighbouring dilutions (wouldn't just get an effect at a single dilution - should also see some effect on either side).
- Questions:
- Storage/availability of data.
- Has talked to Piete about getting a standard format for upload into the database.
- How to automate "hit-picking".
- Surely this is what the statisticians can do?
- It's okay - Chris has this covered.
- But wouldn't it be cool to train up an expert system to behave like Martin?
- Will ALL of this data be put on the database?
- Storage/availability of data.
Chris T: high throughput screens
- Workflow etc.
- Expert system approach to detecting screens which are "interesting"
- Write a paper comparing Chris's semi-directed scheme versus a standard machine learning approach.
- ML approach will also return the actual rules that it used to do the classification.
- LET'S MAKE AN AUTOMARTIN!
Ivonne
- Need to have codes specific to collections of samples from different sites (e.g., Auckland and Christchurch).
- Problem is that there are heavy restrictions on the ChCh data, so we need to be very easily able to identify where the samples came from.
- Samples:
- NuNZ: 77 patients, 9 controls + 121 controls (A. Shelling).
- Krissansen Lab (Auckland): 198 patients.
- Christchurch: 288+ patients, 480 soon. 196 controls.
- Discussion about the use of buccal swabs versus blood samples.
- Need to get a large number of control samples to get an accurate estimate of background allele frequencies.
- Brian has said that we need to concentrate on the caucasian population, because we don't have enough power to find anything else.
- Action plan: Mik, Marcus, Brian and Ivonne will produce workflow for the SNP analysis.
Brian
- Tests
- HW equilibrium test.
- Controls expected to be in HWE.
- Any departure needs to be investigated.
- Not powerful - will only detect a small proproption of possible problems.
- Genotype and allelic tests
- Dominant effects
- Recessive effects
- Differences in genotype effects between cases and controls.
- Allelic - additive alleic effects.
- Cochran-Mantel-Haenszel test - differences between AKL and CHCH?
- Conditions on location of samples.
- MCP
- Permutation test.
- HW equilibrium test.
- Lab practice
- Genotyping errors.
- Duplicate samples.