» 
Arabic Bulgarian Chinese Croatian Czech Danish Dutch English Estonian Finnish French German Greek Hebrew Hindi Hungarian Icelandic Indonesian Italian Japanese Korean Latvian Lithuanian Malagasy Norwegian Persian Polish Portuguese Romanian Russian Serbian Slovak Slovenian Spanish Swedish Thai Turkish Vietnamese
Arabic Bulgarian Chinese Croatian Czech Danish Dutch English Estonian Finnish French German Greek Hebrew Hindi Hungarian Icelandic Indonesian Italian Japanese Korean Latvian Lithuanian Malagasy Norwegian Persian Polish Portuguese Romanian Russian Serbian Slovak Slovenian Spanish Swedish Thai Turkish Vietnamese

definition - Simplified_molecular_input_line_entry_specification

definition of Wikipedia

   Advertizing ▼

Wikipedia

Simplified molecular input line entry specification

From Wikipedia, the free encyclopedia

Jump to: navigation, search
smiles
Filename extension.smi
Type of formatchemical file format
Generation of SMILES: Break cycles, then write as branches off a main backbone.

The simplified molecular input line entry specification or SMILES is a specification for unambiguously describing the structure of chemical molecules using short ASCII strings. SMILES strings can be imported by most molecule editors for conversion back into two-dimensional drawings or three-dimensional models of the molecules.

The original SMILES specification was developed by Arthur Weininger and David Weininger in the late 1980s. It has since been modified and extended by others, most notably by Daylight Chemical Information Systems Inc. In 2007, an open standard called "OpenSMILES" was developed by the Blue Obelisk open-source chemistry community. Other 'linear' notations include the Wiswesser Line Notation (WLN), ROSDAL and SLN (Tripos Inc).

In August 2006, the IUPAC introduced the InChI as a standard for formula representation. SMILES is generally considered to have the advantage of being slightly more human-readable than InChI; it also has a wide base of software support with extensive theoretical (e.g., graph theory) backing.

Contents

Terminology

The term SMILES refers to a line notation for encoding molecular structures and specific instances should strictly be called SMILES strings. However, the term SMILES is also commonly used to refer to both a single SMILES string and a number of SMILES strings; the exact meaning is usually apparent from the context. The terms Canonical and Isomeric can lead to some confusion when applied to SMILES. The terms describe different attributes of SMILES strings and are not mutually exclusive.

Typically, a number of equally valid SMILES can be written for a molecule. For example, CCO, OCC and C(O)C all specify the structure of ethanol. Algorithms have been developed to ensure the same SMILES is generated for a molecule regardless of the order of atoms in the structure. This SMILES is unique for each structure, although dependent on the canonicalisation algorithm used to generate it, and is termed the Canonical SMILES. These algorithms first convert the SMILES to an internal representation of the molecular structure and do not simply manipulate strings as is sometimes thought. Various algorithms for generating Canonical SMILES have been developed, including those by Daylight Chemical Information Systems, OpenEye Scientific Software, MEDIT and Chemical Computing Group. A common application of Canonical SMILES is indexing and ensuring uniqueness of molecules in a database.

SMILES notation allows the specification of configuration at tetrahedral centers, and double bond geometry. These are structural features that cannot be specified by connectivity alone and SMILES which encode this information are termed Isomeric SMILES. A notable feature of these rules is that they allow rigorous partial specification of chirality. The term Isomeric SMILES is also applied to SMILES in which isotopes are specified.

Graph-based definition

In terms of a graph-based computational procedure, SMILES is a string obtained by printing the symbol nodes encountered in a depth-first tree traversal of a chemical graph. The chemical graph is first trimmed to remove hydrogen atoms and cycles are broken to turn it into a spanning tree. Where cycles have been broken, numeric suffix labels are included to indicate the connected nodes. Parentheses are used to indicate points of branching on the tree.

Examples

Atoms

Atoms are represented by the standard abbreviation of the chemical elements, in square brackets, such as [Au] for gold. Brackets can be omitted for the "organic subset" of B, C, N, O, P, S, F, Cl, Br, and I. All other elements must be enclosed in brackets. If the brackets are omitted, the proper number of implicit hydrogen atoms is assumed; for instance the SMILES for water is simply O.

An atom holding one or more electrical charge(s) is enclosed in brackets (whichever is), followed by the symbol H if it is bonded to one or more atoms of hydrogen (these ones are followed by their number so, except if there is one only : NH4 for ammonium), then by the sign '+' for an positive charge or by '-' for an negative charge. Number of charges is specified after the sign (except if there is one only) ; however, it's also possible write the sign as many as the ion hold of charges : instead of "Ti+4", you can also write "Ti++++" (Titanium IV, Ti4+). Thus, the hydroxide anion is represented by [OH-], the oxonium cation is [OH3+] and the cobalt III cation (Co3+) is either [Co+3] or [Co+++].

Bonds

Bonds between aliphatic atoms are assumed to be single unless specified otherwise and are implied by adjacency in the SMILES. For example the SMILES for ethanol can be written as CCO. Ring closure labels are used to indicate connectivity between non-adjacent atoms in the SMILES, which for cyclohexane and dioxane can be written as C1CCCCC1 and O1CCOCC1 respectively. For a second ring, the label will be 2 (naphthalene : c1cccc2c1cccc2), and so on. After 9, the label must be preceded by a '%', in order to differentiate it from two different labels bonded to the same atom (~C12~ will mean the atom of carbon hold the ring closure labels 1 and 2, whereas ~C%12~ will indicate one label only, the 12). Double and triple bonds are represented by the symbols '=' and '#' respectively as illustrated by the SMILES O=C=O (carbon dioxide) and C#N (hydrogen cyanide).

Aromaticity

Aromatic C, O, S and N atoms are shown in their lower case 'c', 'o', 's' and 'n' respectively. Benzene, pyridine and furan can be represented respectively by the SMILES c1ccccc1, n1ccccc1 and o1cccc1. Bonds between aromatic atoms are, by default, aromatic although these can be specified explicitly using the ':' symbol. Aromatic atoms can be singly bonded to each other and biphenyl can be represented by c1ccccc1-c2ccccc2. Aromatic nitrogen bonded to hydrogen, as found in pyrrole must be represented as [nH] and imidazole is written in SMILES notation as n1c[nH]cc1.

The Daylight and OpenEye algorithms for generating canonical SMILES differ in their treatment of aromaticity.

Visualization of 3-cyanoanisole as COc(c1)cccc1C#N.

Branching

Branches are described with parentheses, as in CCC(=O)O for propionic acid and C(F)(F)F for fluoroform. Substituted rings can be written with the branching point in the ring as illustrated by the SMILES COc(c1)cccc1C#N (see depiction) and COc(cc1)ccc1C#N (see depiction) which encode the 3 and 4-cyanoanisole isomers. Writing SMILES for substituted rings in this way can make them more human-readable.

Stereochemistry

Configuration around double bonds is specified using the characters "/" and "\". For example, F/C=C/F (see depiction) is one representation of trans-difluoroethene, in which the fluorine atoms are on opposite sides of the double bond, whereas F/C=C\F (see depiction) is one possible representation of cis-difluoroethene, in which the Fs are on the same side of the double bond, as shown in the figure.

Configuration at tetrahedral carbon is specified by @ or @@. L-Alanine, the more common enantiomer of the amino acid alanine can be written as N[C@@H](C)C(=O)O (see depiction). The @@ specifier indicates that, when viewed from nitrogen along the bond to the chiral center, the sequence of substituents hydrogen (H), methyl (C) and carboxylate (C(=O)O) appear clockwise. D-Alanine can be written as N[C@H](C)C(=O)O (see depiction). The order of the substituents in the SMILES string is very important and D-alanine can also be encoded as N[C@@H](C(=O)O)C (see depiction).

Isotopes

Isotopes are specified with a number equal to the integer isotopic mass preceding the atomic symbol. Benzene in which one atom is carbon-14 is written as [14c]1ccccc1 and deuterochloroform is [2H]C(Cl)(Cl)Cl.

Application on some molecules

MoleculeStructureSMILES Formula
DinitrogenN≡NN#N
Methyl isocyanate (MIC)CH3–N=C=OCN=C=O
Copper(II) sulfateCu2+ SO42-[Cu+2].[O-]S(=O)(=O)[O-]
Œnanthotoxin (C17H22O2) CCC[C@@H](O)CC\C=C\C=C\C#CC#C\C=C\CO
Pyrethrin II (C21H28O3) COC(=O)C(\C)=C\C1C(C)(C)[C@H]1C(=O)O[C@@H]2C(C)=C(C(=O)C2)CC=CC=C
Aflatoxin B1 (C17H12O6) O1C=C[C@H]([C@H]1O2)c3c2cc(OC)c4c3OC(=O)C5=C4CCC(=O)5
Glucose (glucopyranose) (C6H12O6) OC[C@@H](O1)[C@@H](O)[C@H](O)[C@@H](O)[C@@H](O)1
Cuscutin alias Bergenin (a resin) (C14H16O9) OC[C@@H](O1)[C@@H](O)[C@H](O)[C@@H]2[C@@H]1c3c(O)c(OC)c(O)cc3C(=O)O2
A pheromone of the californian scale insect CC(=O)OCCC(/C)=C\C[C@H](C(C)=C)CCC=C
2S,5R-Chalcogran : a pheromone of the bark beetle Pityogenes chalcographus[1] CC[C@H](O1)CC[C@@]12CCCO2
Vanillin O=Cc1ccc(O)c(OC)c1
Melatonin (C13H16N2O2) CC(=O)NCCC1=CNc2c1cc(OC)cc2
Flavopereirin (C17H15N2) CCc(c1)ccc2[n+]1ccc3c2Nc4c3cccc4
Nicotine (C10H14N2) CN1CCC[C@H]1c2cccnc2
Alpha-thujone (C10H16O) CC(C)[C@@]12C[C@@H]1[C@@H](C)C(=O)C2
Thiamin (C12H17ClN4OS+)
(vitamine B1)
OCCc1c(C)[n+](=cs1)Cc2cnc(C)nc(N)2


Illustration with a molecule with more of 9 rings, the Cephalostatin-1[2] (a steroidic trisdecacyclic pyrazine with the empirical formula C54H74N2O10 isolated from the Indian Ocean hemichordate Cephalodiscus gilchristi) :

Will give, starting by the most left methyle radical on the figure :

C[C@@](C)(O1)C[C@@H](O)[C@@]1(O2)[C@@H](C)[C@@H]3CC=C4[C@]3(C2)C(=O)C[C@H]5[C@H]4CC[C@@H](C6)[C@]5(C)Cc(n7)c6nc(C[C@@]89(C))c7C[C@@H]8CC[C@@H]%10[C@@H]9C[C@@H](O)[C@@]%11(C)C%10=C[C@H](O%12)[C@]%11(O)[C@H](C)[C@]%12(O%13)[C@H](O)C[C@@]%13(C)CO

(Notice the '%' in front of the indice of the ring closure labels upper to 9, see the section "Bonds", higher).

Other examples of SMILES

The SMILES notation is described extensively in the SMILES theory manual provided by Daylight Chemical Information Systems and a number of illustrative examples are presented. Daylight's depict utility provides users with the means to check their own examples of SMILES and is a valuable educational tool.

Extensions

SMARTS is a line notation for specification of substructural patterns in molecules. While it uses many of the same symbols as SMILES, it also allows specification of wildcard atoms and bonds, which can be used to define substructural queries for chemical database searching. One common misconception is that SMARTS-based substructural searching involves matching of SMILES and SMARTS strings. In fact, both SMILES and SMARTS strings are first converted to internal graph representations which are searched for subgraph isomorphism. SMIRKS is a line notation for specifying reaction transforms.

Conversion

SMILES can be converted back to 2-dimensional representations using Structure Diagram Generation algorithms (Helson, 1999). This conversion is not always unambiguous. Conversion to 3-dimensional representation is achieved by energy minimization approaches. There are many downloadable and web-based conversion utilities.

See also

References

  • Anderson, E.; Veith, G.D; Weininger, D. (1987) SMILES: A line notation and computerized interpreter for chemical structures. Report No. EPA/600/M-87/021. U.S. EPA, Environmental Research Laboratory-Duluth, Duluth, MN 55804
  • Weininger, D. (1988), SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules, J. Chem. Inf. Comput. Sci. 28, 31-36.
  • Weininger, D.; Weininger, A.; Weininger, J.L. (1989) SMILES. 2. Algorithm for generation of unique SMILES notation J. Chem. Inf. Comput. Sci. 29, 97-101.
  • Helson, H.E. (1999) Structure Diagram Generation In Rev. Comput. Chem. edited by Lipkowitz, K. B. and Boyd, D. B. Wiley-VCH, New York, pages 313-398.

External links

Specifications

SMILES related software utilities

 

All translations of Simplified_molecular_input_line_entry_specification


sensagent's content

  • definitions
  • synonyms
  • antonyms
  • encyclopedia

Dictionary and translator for handheld

⇨ New : sensagent is now available on your handheld

   Advertising ▼

sensagent's office

Shortkey or widget. Free.

Windows Shortkey: sensagent. Free.

Vista Widget : sensagent. Free.

Webmaster Solution

Alexandria

A windows (pop-into) of information (full-content of Sensagent) triggered by double-clicking any word on your webpage. Give contextual explanation and translation from your sites !

Try here  or   get the code

SensagentBox

With a SensagentBox, visitors to your site can access reliable information on over 5 million pages provided by Sensagent.com. Choose the design that fits your site.

Business solution

Improve your site content

Add new content to your site from Sensagent by XML.

Crawl products or adds

Get XML access to reach the best products.

Index images and define metadata

Get XML access to fix the meaning of your metadata.


Please, email us to describe your idea.

WordGame

The English word games are:
○   Anagrams
○   Wildcard, crossword
○   Lettris
○   Boggle.

Lettris

Lettris is a curious tetris-clone game where all the bricks have the same square shape but different content. Each square carries a letter. To make squares disappear and save space for other squares you have to assemble English words (left, right, up, down) from the falling squares.

boggle

Boggle gives you 3 minutes to find as many words (3 letters or more) as you can in a grid of 16 letters. You can also try the grid of 16 letters. Letters must be adjacent and longer words score better. See if you can get into the grid Hall of Fame !

English dictionary
Main references

Most English definitions are provided by WordNet .
English thesaurus is mainly derived from The Integral Dictionary (TID).
English Encyclopedia is licensed by Wikipedia (GNU).

Copyrights

The wordgames anagrams, crossword, Lettris and Boggle are provided by Memodata.
The web service Alexandria is granted from Memodata for the Ebay search.
The SensagentBox are offered by sensAgent.

Translation

Change the target language to find translations.
Tips: browse the semantic fields (see From ideas to words) in two languages to learn more.

last searches on the dictionary :

4983 online visitors

computed in 0.094s

   Advertising ▼

I would like to report:
section :
a spelling or a grammatical mistake
an offensive content(racist, pornographic, injurious, etc.)
a copyright violation
an error
a missing statement
other
please precise:

Advertize

Partnership

Company informations

My account

login

registration

   Advertising ▼