2 Introducing the Shell
2.1 What is a shell and why should I care?
A shell is a computer program that presents a command line interface which allows you to control your computer using commands entered with a keyboard instead of controlling graphical user interfaces (GUIs) with a mouse/keyboard/touchscreen combination.
There are many reasons to learn about the shell:
Many bioinformatics tools can only be used through a command line interface. Many more have features and parameter options which are not available in the GUI. BLAST is an example. Many of the advanced functions are only accessible to users who know how to use a shell.
The shell makes your work less boring. In bioinformatics you often need to repeat tasks with a large number of files. With the shell, you can automate those repetitive tasks and leave you free to do more exciting things.
The shell makes your work less error-prone. When humans do the same thing a hundred different times (or even ten times), they’re likely to make a mistake. Your computer can do the same thing a thousand times with no mistakes.
The shell makes your work more reproducible. When you carry out your work in the command-line (rather than a GUI), your computer keeps a record of every step that you’ve carried out, which you can use to re-do your work when you need to. It also gives you a way to communicate unambiguously what you’ve done, so that others can inspect or apply your process to new data.
Many bioinformatic tasks require large amounts of computing power and can’t realistically be run on your own machine. These tasks are best performed using remote computers or cloud computing, which can only be accessed through a shell.
In this lesson you will learn how to use the command line interface to move around in your file system.
2.2 How to access the shell
On a Mac or Linux machine, you can access a shell through a program called “Terminal”, which is already available on your computer. The Terminal is a window into which we will type commands. If you’re using Windows, you’ll need to download a separate program to access the shell.
2.4 Tip
Hot-key combinations are shortcuts for performing common commands.
The hot-key combination for clearing the console is Ctrl+L
. Feel free to try it and see for yourself.
Let’s find out where we are by running a command called pwd
(which stands for “print working directory”).
At any moment, our current working directory
is our current default directory,
i.e.,
the directory that the computer assumes we want to run commands in,
unless we explicitly specify something else.
Here,
the computer’s response is /Users/your-username
,
which is the top level directory within our cloud system:
cd
pwd
## /Users/ggiaever
Let’s look at how our file system is organized. We can see what files and subdirectories are in this directory by running ls
,
which stands for “listing”:
ls
## 01-introduction.Rmd
## 02-the-filesystem.Rmd
## 03-working-with-files.Rmd
## 04-redirection.Rmd
## 05-writing-scripts.Rmd
## 06-organization.Rmd
## Shell_genomics_answers.Rmd
## Shell_genomics_exercises.Rmd
## [34m_book[39;49m[0m
## _bookdown.yml
## _output.yml
## [34mcss[39;49m[0m
## [34mfig[39;49m[0m
## index.Rmd
## index.md
## render920c7c54982c.rds
## [34mrsconnect[39;49m[0m
## shell-genomics.rds
## shell_homework.txt
ls
prints the names of the files and directories in the current directory in
alphabetical order,
arranged neatly into columns.
We’ll be working within the ~/shell_data
subdirectory, and creating new subdirectories, throughout this workshop.
The command to change locations in our file system is cd
, followed by a
directory name to change our working directory.
cd
stands for “change directory”.
Let’s say we want to navigate to the shell_data
directory we saw above. We can
use the following command to get there:
cd ~/shell_data
Let’s look at what is in this directory:
cd ~/shell_data
ls
## [34msra_metadata[39;49m[0m
## [34muntrimmed_fastq[39;49m[0m
We can make the ls
output more comprehensible by using the flag -F
,
which tells ls
to add a trailing /
to the names of directories:
cd ~/shell_data
ls -F
## [34msra_metadata[39;49m[0m/
## [34muntrimmed_fastq[39;49m[0m/
Anything with a “/” after it is a directory. Things with a “*” after them are programs. If there are no decorations, it’s a file.
ls
has lots of other options. To find out what they are, we can type:
man ls
## LS(1) General Commands Manual LS(1)
##
## NNAAMMEE
## llss - list directory contents
##
## SSYYNNOOPPSSIISS
## llss [--@@AABBCCFFGGHHIILLOOPPRRSSTTUUWWaabbccddeeffgghhiikkllmmnnooppqqrrssttuuvvwwxxyy11%%,,] [----ccoolloorr=_w_h_e_n]
## [--DD _f_o_r_m_a_t] [_f_i_l_e _._._.]
##
## DDEESSCCRRIIPPTTIIOONN
## For each operand that names a _f_i_l_e of a type other than directory, llss
## displays its name as well as any requested, associated information. For
## each operand that names a _f_i_l_e of type directory, llss displays the names of
## files contained within that directory, as well as any requested, associated
## information.
##
## If no operands are given, the contents of the current directory are
## displayed. If more than one operand is given, non-directory operands are
## displayed first; directory and non-directory operands are sorted separately
## and in lexicographical order.
##
## The following options are available:
##
## --@@ Display extended attribute keys and sizes in long (--ll) output.
##
## --AA Include directory entries whose names begin with a dot (`_.') except
## for _. and _._.. Automatically set for the super-user unless --II is
## specified.
##
## --BB Force printing of non-printable characters (as defined by ctype(3)
## and current locale settings) in file names as \_x_x_x, where _x_x_x is
## the numeric value of the character in octal. This option is not
## defined in IEEE Std 1003.1-2008 ("POSIX.1").
##
## --CC Force multi-column output; this is the default when output is to a
## terminal.
##
## --DD _f_o_r_m_a_t
## When printing in the long (--ll) format, use _f_o_r_m_a_t to format the
## date and time output. The argument _f_o_r_m_a_t is a string used by
## strftime(3). Depending on the choice of format string, this may
## result in a different number of columns in the output. This option
## overrides the --TT option. This option is not defined in IEEE Std
## 1003.1-2008 ("POSIX.1").
##
## --FF Display a slash (`/') immediately after each pathname that is a
## directory, an asterisk (`*') after each that is executable, an at
## sign (`@') after each symbolic link, an equals sign (`=') after
## each socket, a percent sign (`%') after each whiteout, and a
## vertical bar (`|') after each that is a FIFO.
##
## --GG Enable colorized output. This option is equivalent to defining
## CLICOLOR or COLORTERM in the environment and setting ----ccoolloorr=_a_u_t_o.
## (See below.) This functionality can be compiled out by removing
## the definition of COLORLS. This option is not defined in IEEE Std
## 1003.1-2008 ("POSIX.1").
##
## --HH Symbolic links on the command line are followed. This option is
## assumed if none of the --FF, --dd, or --ll options are specified.
##
## --II Prevent --AA from being automatically set for the super-user. This
## option is not defined in IEEE Std 1003.1-2008 ("POSIX.1").
##
## --LL Follow all symbolic links to final target and list the file or
## directory the link references rather than the link itself. This
## option cancels the --PP option.
##
## --OO Include the file flags in a long (--ll) output. This option is
## incompatible with IEEE Std 1003.1-2008 ("POSIX.1"). See chflags(1)
## for a list of file flags and their meanings.
##
## --PP If argument is a symbolic link, list the link itself rather than
## the object the link references. This option cancels the --HH and --LL
## options.
##
## --RR Recursively list subdirectories encountered.
##
## --SS Sort by size (largest file first) before sorting the operands in
## lexicographical order.
##
## --TT When printing in the long (--ll) format, display complete time
## information for the file, including month, day, hour, minute,
## second, and year. The --DD option gives even more control over the
## output format. This option is not defined in IEEE Std 1003.1-2008
## ("POSIX.1").
##
## --UU Use time when file was created for sorting or printing. This
## option is not defined in IEEE Std 1003.1-2008 ("POSIX.1").
##
## --WW Display whiteouts when scanning directories. This option is not
## defined in IEEE Std 1003.1-2008 ("POSIX.1").
##
## --aa Include directory entries whose names begin with a dot (`_.').
##
## --bb As --BB, but use C escape codes whenever possible. This option is
## not defined in IEEE Std 1003.1-2008 ("POSIX.1").
##
## --cc Use time when file status was last changed for sorting or printing.
##
## ----ccoolloorr=_w_h_e_n
## Output colored escape sequences based on _w_h_e_n, which may be set to
## either aallwwaayyss, aauuttoo, or nneevveerr.
##
## aallwwaayyss will make llss always output color. If TERM is unset or set
## to an invalid terminal, then llss will fall back to explicit ANSI
## escape sequences without the help of termcap(5). aallwwaayyss is the
## default if ----ccoolloorr is specified without an argument.
##
## aauuttoo will make llss output escape sequences based on termcap(5), but
## only if stdout is a tty and either the --GG flag is specified or the
## COLORTERM environment variable is set and not empty.
##
## nneevveerr will disable color regardless of environment variables.
## nneevveerr is the default when neither ----ccoolloorr nor --GG is specified.
##
## For compatibility with GNU coreutils, llss supports yyeess or ffoorrccee as
## equivalent to aallwwaayyss, nnoo or nnoonnee as equivalent to nneevveerr, and ttttyy or
## iiff--ttttyy as equivalent to aauuttoo.
##
## --dd Directories are listed as plain files (not searched recursively).
##
## --ee Print the Access Control List (ACL) associated with the file, if
## present, in long (--ll) output.
##
## --ff Output is not sorted. This option turns on --aa. It also negates
## the effect of the --rr, --SS and --tt options. As allowed by IEEE Std
## 1003.1-2008 ("POSIX.1"), this option has no effect on the --dd, --ll,
## --RR and --ss options.
##
## --gg This option has no effect. It is only available for compatibility
## with 4.3BSD, where it was used to display the group name in the
## long (--ll) format output. This option is incompatible with IEEE Std
## 1003.1-2008 ("POSIX.1").
##
## --hh When used with the --ll option, use unit suffixes: Byte, Kilobyte,
## Megabyte, Gigabyte, Terabyte and Petabyte in order to reduce the
## number of digits to four or fewer using base 2 for sizes. This
## option is not defined in IEEE Std 1003.1-2008 ("POSIX.1").
##
## --ii For each file, print the file's file serial number (inode number).
##
## --kk This has the same effect as setting environment variable BLOCKSIZE
## to 1024, except that it also nullifies any --hh options to its left.
##
## --ll (The lowercase letter "ell".) List files in the long format, as
## described in the _T_h_e _L_o_n_g _F_o_r_m_a_t subsection below.
##
## --mm Stream output format; list files across the page, separated by
## commas.
##
## --nn Display user and group IDs numerically rather than converting to a
## user or group name in a long (--ll) output. This option turns on the
## --ll option.
##
## --oo List in long format, but omit the group id.
##
## --pp Write a slash (`/') after each filename if that file is a
## directory.
##
## --qq Force printing of non-graphic characters in file names as the
## character `?'; this is the default when output is to a terminal.
##
## --rr Reverse the order of the sort.
##
## --ss Display the number of blocks used in the file system by each file.
## Block sizes and directory totals are handled as described in _T_h_e
## _L_o_n_g _F_o_r_m_a_t subsection below, except (if the long format is not
## also requested) the directory totals are not output when the output
## is in a single column, even if multi-column output is requested.
## (--ll) format, display complete time information for the file,
## including month, day, hour, minute, second, and year. The --DD
## option gives even more control over the output format. This option
## is not defined in IEEE Std 1003.1-2008 ("POSIX.1").
##
## --tt Sort by descending time modified (most recently modified first).
## If two files have the same modification timestamp, sort their names
## in ascending lexicographical order. The --rr option reverses both of
## these sort orders.
##
## Note that these sort orders are contradictory: the time sequence is
## in descending order, the lexicographical sort is in ascending
## order. This behavior is mandated by IEEE Std 1003.2 ("POSIX.2").
## This feature can cause problems listing files stored with
## sequential names on FAT file systems, such as from digital cameras,
## where it is possible to have more than one image with the same
## timestamp. In such a case, the photos cannot be listed in the
## sequence in which they were taken. To ensure the same sort order
## for time and for lexicographical sorting, set the environment
## variable LS_SAMESORT or use the --yy option. This causes llss to
## reverse the lexicographical sort order when sorting files with the
## same modification timestamp.
##
## --uu Use time of last access, instead of time of last modification of
## the file for sorting (--tt) or long printing (--ll).
##
## --vv Force unedited printing of non-graphic characters; this is the
## default when output is not to a terminal.
##
## --ww Force raw printing of non-printable characters. This is the
## default when output is not to a terminal. This option is not
## defined in IEEE Std 1003.1-2001 ("POSIX.1").
##
## --xx The same as --CC, except that the multi-column output is produced
## with entries sorted across, rather than down, the columns.
##
## --yy When the --tt option is set, sort the alphabetical output in the same
## order as the time output. This has the same effect as setting
## LS_SAMESORT. See the description of the --tt option for more
## details. This option is not defined in IEEE Std 1003.1-2001
## ("POSIX.1").
##
## --%% Distinguish dataless files and directories with a '%' character in
## long
##
## --11 (The numeric digit "one".) Force output to be one entry per line.
## This is the default when output is not to a terminal. (--ll) output,
## and don't materialize dataless directories when listing them.
##
## --, (Comma) When the --ll option is set, print file sizes grouped and
## separated by thousands using the non-monetary separator returned by
## localeconv(3), typically a comma or period. If no locale is set,
## or the locale does not have a non-monetary separator, this option
## has no effect. This option is not defined in IEEE Std 1003.1-2001
## ("POSIX.1").
##
## The --11, --CC, --xx, and --ll options all override each other; the last one
## specified determines the format used.
##
## The --cc, --uu, and --UU options all override each other; the last one specified
## determines the file time used.
##
## The --SS and --tt options override each other; the last one specified
## determines the sort order used.
##
## The --BB, --bb, --ww, and --qq options all override each other; the last one
## specified determines the format used for non-printable characters.
##
## The --HH, --LL and --PP options all override each other (either partially or
## fully); they are applied in the order specified.
##
## By default, llss lists one entry per line to standard output; the exceptions
## are to terminals or when the --CC or --xx options are specified.
##
## File information is displayed with one or more <blank>s separating the
## information associated with the --ii, --ss, and --ll options.
##
## TThhee LLoonngg FFoorrmmaatt
## If the --ll option is given, the following information is displayed for each
## file: file mode, number of links, owner name, group name, number of bytes
## in the file, abbreviated month, day-of-month file was last modified, hour
## file last modified, minute file last modified, and the pathname. If the
## file or directory has extended attributes, the permissions field printed by
## the --ll option is followed by a '@' character. Otherwise, if the file or
## directory has extended security information (such as an access control
## list), the permissions field printed by the --ll option is followed by a '+'
## character. If the --%% option is given, a '%' character follows the
## permissions field for dataless files and directories, possibly replacing
## the '@' or '+' character.
##
## If the modification time of the file is more than 6 months in the past or
## future, and the --DD or --TT are not specified, then the year of the last
## modification is displayed in place of the hour and minute fields.
##
## If the owner or group names are not a known user or group name, or the --nn
## option is given, the numeric ID's are displayed.
##
## If the file is a character special or block special file, the device number
## for the file is displayed in the size field. If the file is a symbolic
## link the pathname of the linked-to file is preceded by "->".
##
## The listing of a directory's contents is preceded by a labeled total number
## of blocks used in the file system by the files which are listed as the
## directory's contents (which may or may not include _. and _._. and other files
## which start with a dot, depending on other options).
##
## The default block size is 512 bytes. The block size may be set with option
## --kk or environment variable BLOCKSIZE. Numbers of blocks in the output will
## have been rounded up so the numbers of bytes is at least as many as used by
## the corresponding file system blocks (which might have a different size).
##
## The file mode printed under the --ll option consists of the entry type and
## the permissions. The entry type character describes the type of file, as
## follows:
##
## -- Regular file.
## bb Block special file.
## cc Character special file.
## dd Directory.
## ll Symbolic link.
## pp FIFO.
## ss Socket.
## ww Whiteout.
##
## The next three fields are three characters each: owner permissions, group
## permissions, and other permissions. Each field has three character
## positions:
##
## 1. If rr, the file is readable; if --, it is not readable.
##
## 2. If ww, the file is writable; if --, it is not writable.
##
## 3. The first of the following that applies:
##
## SS If in the owner permissions, the file is not
## executable and set-user-ID mode is set. If in the
## group permissions, the file is not executable and
## set-group-ID mode is set.
##
## ss If in the owner permissions, the file is executable
## and set-user-ID mode is set. If in the group
## permissions, the file is executable and setgroup-ID
## mode is set.
##
## xx The file is executable or the directory is
## searchable.
##
## -- The file is neither readable, writable, executable,
## nor set-user-ID nor set-group-ID mode, nor sticky.
## (See below.)
##
## These next two apply only to the third character in the last
## group (other permissions).
##
## TT The sticky bit is set (mode 1000), but not execute
## or search permission. (See chmod(1) or sticky(7).)
##
## tt The sticky bit is set (mode 1000), and is searchable
## or executable. (See chmod(1) or sticky(7).)
##
## The next field contains a plus (`+') character if the file has an ACL, or a
## space (` ') if it does not. The llss utility does not show the actual ACL;
## use getfacl(1) to do this.
##
## EENNVVIIRROONNMMEENNTT
## The following environment variables affect the execution of llss:
##
## BLOCKSIZE If this is set, its value, rounded up to 512 or down to
## a multiple of 512, will be used as the block size in
## bytes by the --ll and --ss options. See _T_h_e _L_o_n_g _F_o_r_m_a_t
## subsection for more information.
##
## CLICOLOR Use ANSI color sequences to distinguish file types.
## See LSCOLORS below. In addition to the file types
## mentioned in the --FF option some extra attributes
## (setuid bit set, etc.) are also displayed. The
## colorization is dependent on a terminal type with the
## proper termcap(5) capabilities. The default "cons25"
## console has the proper capabilities, but to display the
## colors in an xterm(1), for example, the TERM variable
## must be set to "xterm-color". Other terminal types may
## require similar adjustments. Colorization is silently
## disabled if the output is not directed to a terminal
## unless the CLICOLOR_FORCE variable is defined or
## ----ccoolloorr is set to "always".
##
## CLICOLOR_FORCE Color sequences are normally disabled if the output is
## not directed to a terminal. This can be overridden by
## setting this variable. The TERM variable still needs
## to reference a color capable terminal however otherwise
## it is not possible to determine which color sequences
## to use.
##
## COLORTERM See description for CLICOLOR above.
##
## COLUMNS If this variable contains a string representing a
## decimal integer, it is used as the column position
## width for displaying multiple-text-column output. The
## llss utility calculates how many pathname text columns to
## display based on the width provided. (See --CC and --xx.)
##
## LANG The locale to use when determining the order of day and
## month in the long --ll format output. See environ(7) for
## more information.
##
## LSCOLORS The value of this variable describes what color to use
## for which attribute when colors are enabled with
## CLICOLOR or COLORTERM. This string is a concatenation
## of pairs of the format _f_b, where _f is the foreground
## color and _b is the background color.
##
## The color designators are as follows:
##
## aa black
## bb red
## cc green
## dd brown
## ee blue
## ff magenta
## gg cyan
## hh light grey
## AA bold black, usually shows up as dark grey
## BB bold red
## CC bold green
## DD bold brown, usually shows up as yellow
## EE bold blue
## FF bold magenta
## GG bold cyan
## HH bold light grey; looks like bright white
## xx default foreground or background
##
## Note that the above are standard ANSI colors. The
## actual display may differ depending on the color
## capabilities of the terminal in use.
##
## The order of the attributes are as follows:
##
## 1. directory
## 2. symbolic link
## 3. socket
## 4. pipe
## 5. executable
## 6. block special
## 7. character special
## 8. executable with setuid bit set
## 9. executable with setgid bit set
## 10. directory writable to others, with sticky
## bit
## 11. directory writable to others, without sticky
## bit
##
## The default is "exfxcxdxbxegedabagacad", i.e., blue
## foreground and default background for regular
## directories, black foreground and red background for
## setuid executables, etc.
##
## LS_COLWIDTHS If this variable is set, it is considered to be a
## colon-delimited list of minimum column widths.
## Unreasonable and insufficient widths are ignored (thus
## zero signifies a dynamically sized column). Not all
## columns have changeable widths. The fields are, in
## order: inode, block count, number of links, user name,
## group name, flags, file size, file name.
##
## LS_SAMESORT If this variable is set, the --tt option sorts the names
## of files with the same modification timestamp in the
## same sense as the time sort. See the description of
## the --tt option for more details.
##
## TERM The CLICOLOR and COLORTERM functionality depends on a
## terminal type with color capabilities.
##
## TZ The timezone to use when displaying dates. See
## environ(7) for more information.
##
## EEXXIITT SSTTAATTUUSS
## The llss utility exits 0 on success, and >0 if an error occurs.
##
## EEXXAAMMPPLLEESS
## List the contents of the current working directory in long format:
##
## $ ls -l
##
## In addition to listing the contents of the current working directory in
## long format, show inode numbers, file flags (see chflags(1)), and suffix
## each filename with a symbol representing its file type:
##
## $ ls -lioF
##
## List the files in _/_v_a_r_/_l_o_g, sorting the output such that the most recently
## modified entries are printed first:
##
## $ ls -lt /var/log
##
## CCOOMMPPAATTIIBBIILLIITTYY
## The group field is now automatically included in the long listing for files
## in order to be compatible with the IEEE Std 1003.2 ("POSIX.2")
## specification.
##
## LLEEGGAACCYY DDEESSCCRRIIPPTTIIOONN
## In legacy mode, the --ff option does not turn on the --aa option and the --gg,
## --nn, and --oo options do not turn on the --ll option.
##
## Also, the --oo option causes the file flags to be included in a long (-l)
## output; there is no --OO option.
##
## When --HH is specified (and not overridden by --LL or --PP) and a file argument
## is a symlink that resolves to a non-directory file, the output will reflect
## the nature of the link, rather than that of the file. In legacy operation,
## the output will describe the file.
##
## For more information about legacy mode, see compat(5).
##
## SSEEEE AALLSSOO
## chflags(1), chmod(1), getfacl(1), sort(1), xterm(1), localeconv(3),
## strftime(3), strmode(3), compat(5), termcap(5), sticky(7), symlink(7)
##
## SSTTAANNDDAARRDDSS
## With the exception of options --gg, --nn and --oo, the llss utility conforms to
## IEEE Std 1003.1-2001 ("POSIX.1") and IEEE Std 1003.1-2008 ("POSIX.1"). The
## options --BB, --DD, --GG, --II, --TT, --UU, --WW, --ZZ, --bb, --hh, --ww, --yy and --, are non-
## standard extensions.
##
## The ACL support is compatible with IEEE Std 1003.2c ("POSIX.2c") Draft 17
## (withdrawn).
##
## HHIISSTTOORRYY
## An llss command appeared in Version 1 AT&T UNIX.
##
## BBUUGGSS
## To maintain backward compatibility, the relationships between the many
## options are quite complex.
##
## The exception mentioned in the --ss option description might be a feature
## that was based on the fact that single-column output usually goes to
## something other than a terminal. It is debatable whether this is a design
## bug.
##
## IEEE Std 1003.2 ("POSIX.2") mandates opposite sort orders for files with
## the same timestamp when sorting with the --tt option.
##
## macOS 12.6 August 31, 2020 macOS 12.6
man
(short for manual) displays detailed documentation (also referred as man page or man file)
for zsh
commands. It is a powerful resource to explore zsh
commands, understand
their usage and flags. Some manual files are very long. You can scroll through the
file using your keyboard’s down arrow or use the Space or f key to go forward one page
and the b key to go backwards one page. When you are done reading, hit the q key
to quit.
Navigating the manual: Space or f or Control ⌃-f to advance one page d or Control ⌃-d to advance half a page b or Control ⌃-b to go back one page u or Control ⌃-u to go back half a page
2.5 Challenge
Use the -l
option for the ls
command to display more information for each item
in the directory. What is one piece of additional information this long format
gives you that you don’t see with the bare ls
command?
2.6 Solution
cd ~/shell_data
ls -l
## total 0
## drwxr-x---@ 3 ggiaever staff 96 Jul 30 2015 [34msra_metadata[39;49m[0m
## drwxr-xr-x@ 24 ggiaever staff 768 Nov 26 18:03 [34muntrimmed_fastq[39;49m[0m
No one can possibly learn all of these arguments, that’s what the manual page is for. You can (and should) refer to the manual page or other help files as needed.
Let’s go into the untrimmed_fastq
directory and see what is in there.
cd ~/shell_data/untrimmed_fastq
ls -F
## SRR097977.fastq
## SRR098026.fastq
## [31mbad-reads-script.sh[39;49m[0m*
## bad_reads_2022_2022.txt
## bad_reads_2022_2022_2022_2022.txt
## bad_reads_2022_2022_2022_2022_2022_2022.txt
## bad_reads_2022_2022_2022_2022_2022_2022_2022_2022.txt
## scripted_bad_reads.txt
## scripted_bad_reads_2022.txt
## scripted_bad_reads_2022_2022_2022.txt
## scripted_bad_reads_2022_2022_2022_2022_2022.txt
## scripted_bad_reads_2022_2022_2022_2022_2022_2022_2022_2022.txt
## seq_info_2022_2022.txt
## seq_info_2022_2022_2022_2022.txt
## seq_info_2022_2022_2022_2022_2022_2022.txt
## seq_info_2022_2022_2022_2022_2022_2022_2022_2022.txt
## species_EnsemblBacteria.txt
## species_EnsemblBacteria_2022.txt
## species_EnsemblBacteria_2022_2022_2022.txt
## species_EnsemblBacteria_2022_2022_2022_2022_2022.txt
## species_EnsemblBacteria_2022_2022_2022_2022_2022_2022_2022_2022.txt
This directory contains two files with .fastq
extensions. FASTQ is a format
for storing information about sequencing reads and their quality.
We will be learning more about FASTQ files in a later lesson.
2.6.1 Shortcut: Tab Completion
Typing out file or directory names can waste a lot of time and it’s easy to make typing mistakes. Instead we can use tab complete as a shortcut. When you start typing out the name of a directory or file, then hit the <kbdTab</kbdkey, the shell will try to fill in the rest of the directory or file name.
Return to your home directory:
cd ~
then enter:
cd sh<kbdTab</kbdkey
The shell will fill in the rest of the directory name for shell_data
.
Now change directories to untrimmed_fastq
in shell_data
cd ~/shell_data
cd untrimmed_fastq
Using tab complete can be very helpful. However, it will only autocomplete a file or directory name if you’ve typed enough characters to provide a unique identifier for the file or directory you are trying to access.
For example, if we now try to list the files which names start with SR
by using tab complete:
cd ~/shell_data
cd untrimmed_fastq
ls SR<tab
The shell auto-completes your command to SRR09
, because all file names in
the directory begin with this prefix. When you hit
<kbdTab